mirror of
https://github.com/llvm/llvm-project.git
synced 2025-05-16 04:56:06 +00:00

This is similar to the recent move/addition of "scaleShuffleMask" (D76508), but there are a couple of differences: 1. The existing x86 helper (canWidenShuffleElements) always tries to divide-by-2, so it gets called iteratively and wouldn't handle the general case of non-pow-2 length. 2. The existing x86 code handles "SM_SentinelZero" - we don't have that in IR, but this code should be safe to use with that or other special (negative) values. The motivation is to enable shuffle folds in instcombine/vector-combine that are similar to D76844 and D76727, but in the reverse-bitcast direction. Those patterns are visible in the tests for D40633. Differential Revision: https://reviews.llvm.org/D77881
Analysis Opportunities: //===---------------------------------------------------------------------===// In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the ScalarEvolution expression for %r is this: {1,+,3,+,2}<loop> Outside the loop, this could be evaluated simply as (%n * %n), however ScalarEvolution currently evaluates it as (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n)) In addition to being much more complicated, it involves i65 arithmetic, which is very inefficient when expanded into code. //===---------------------------------------------------------------------===// In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll, ScalarEvolution is forming this expression: ((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32))) This could be folded to (-1 * (trunc i64 undef to i32)) //===---------------------------------------------------------------------===//