mirror of
https://github.com/llvm/llvm-project.git
synced 2025-05-18 19:06:05 +00:00

Summary: for.outer: br for.inner for.inner: LI <loop invariant load instruction> for.inner.latch: br for.inner, for.outer.latch for.outer.latch: br for.outer, for.outer.exit LI is a loop invariant load instruction that post dominate for.outer, so LI should be able to move out of the loop nest. However, there is a bug in allLoopPathsLeadToBlock(). Current algorithm of allLoopPathsLeadToBlock() 1. get all the transitive predecessors of the basic block LI belongs to (for.inner) ==> for.outer, for.inner.latch 2. if any successors of any of the predecessors are not for.inner or for.inner's predecessors, then return false 3. return true Although for.inner.latch is for.inner's predecessor, but for.inner dominates for.inner.latch, which means if for.inner.latch is ever executed, for.inner should be as well. It should not return false for cases like this. Author: Whitney (committed by xingxue) Reviewers: kbarton, jdoerfert, Meinersbur, hfinkel, fhahn Reviewed By: jdoerfert Subscribers: hiraditya, jsji, llvm-commits, etiotto, bmahjour Tags: #LLVM Differential Revision: https://reviews.llvm.org/D62418 llvm-svn: 361762
Analysis Opportunities: //===---------------------------------------------------------------------===// In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the ScalarEvolution expression for %r is this: {1,+,3,+,2}<loop> Outside the loop, this could be evaluated simply as (%n * %n), however ScalarEvolution currently evaluates it as (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n)) In addition to being much more complicated, it involves i65 arithmetic, which is very inefficient when expanded into code. //===---------------------------------------------------------------------===// In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll, ScalarEvolution is forming this expression: ((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32))) This could be folded to (-1 * (trunc i64 undef to i32)) //===---------------------------------------------------------------------===//