mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-27 15:36:05 +00:00

This caused failures such as: Instruction does not dominate all uses! %29 = insertelement <8 x i64> %28, i64 %xor6.i.5, i64 6 %17 = shufflevector <8 x i64> %29, <8 x i64> poison, <6 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6> see comment on https://github.com/llvm/llvm-project/pull/123360 > Previous version was reviewed here https://github.com/llvm/llvm-project/pull/123360 > It is mostly the same, adjusted after graph-to-tree transformation > > Patch tries to remove wide alternate operations. > Currently SLP vectorizer emits something like this: > ``` > %0 = add i32 > %1 = sub i32 > %2 = add i32 > %3 = sub i32 > %4 = add i32 > %5 = sub i32 > %6 = add i32 > %7 = sub i32 > > transformes to > > %v1 = add <8 x i32> > %v2 = sub <8 x i32> > %res = shuffle %v1, %v2, <0, 9, 2, 11, 4, 13, 6, 15> > ``` > i.e. half of the results are just unused. This leads to increased > register pressure and potentially doubles number of operations. > > Patch introduces SplitVectorize mode, where it splits the operations by > opcodes and produces instead something like this: > ``` > %v1 = add <4 x i32> > %v2 = sub <4 x i32> > %res = shuffle %v1, %v2, <0, 4, 1, 5, 2, 6, 3, 7> > ``` > It allows to improve the performance by reducing number of ops. Also, it > turns on some other improvements, like improved graph reordering. > > [...] This reverts commit 9d37e61fc77d3d6de891c30630f1c0227522031d as well as the follow-up commit 72bb0a9a9c6fdde43e1e191f2dc0d5d2d46aff4e.