LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that, it needs
to strip any poison generating flags (nsw, nuw, exact, nneg, etc..)
which may not be valid for the newly added users.
This is conservatively correct, but has the effect that LSR will strip
nneg flags on zext instructions involved in trip counts in loop
preheaders. To avoid this, this patch adjusts the expanded to reinfer
the flags on the CSE candidate if legal for all possible users.
This should fix the regression reported in
https://github.com/llvm/llvm-project/issues/71200.
This should arguably be done inside canReuseInstruction instead, but
doing it outside is more conservative compile time wise. Both
canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so
right now we are performing work which is roughly O(N^2) in the size of
the operand graph. We should fix that before making the per operand step
more expensive. My tenative plan is to land this, and then rework the
code to sink the logic into more core interfaces.
In the d6e7c162e1df3736d8e2b3610a831b7cfa5be99b was introduced util to
to extract widenable conditions from branch. That util was applied in
the llvm::isWidenableBranch to check if branch is widenable. So we
consider branch is widenable if it has widenable condition anywhere in
the condition tree. But that will be true when we finish GuardWidening
reworking from branch widening to widenable conditions widening.
For now we still need to check that widenable branch is in the form of:
`br(widenable_condition & (...))`,
because that form is assumed by LoopPredication and GuardWidening
algorithms.
Fixes: https://github.com/llvm/llvm-project/issues/66418
Co-authored-by: Aleksander Popov <apopov@azul.com>
Currently after widening br(WC && (c1 && c2)) we insert assume of
(c1 && c2) which is joined to WC by And operation.
But we are going to support more flexible form of widenable branches
where WC could be placed arbitrary in the expression tree, e.g:
br(c1 && (c2 && WC)).
In that case we won't have (c1 && c2) in the IR. So we need to add
explicit (c1 && c2) and then create an assumption of it.
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D157502
Loop predication's predicateLoopExit pass does two incorrect things:
It sinks the widenable call into the loop, thereby converting an invariant condition to a variant one
It widens the widenable call at a branch thereby converting the branch into a loop-varying one.
The latter is problematic when the branch may have been loop-invariant
and prior optimizations (such as indvars) may have relied on this
fact, and updated the deopt state accordingly.
Now, when we widen this with a loop-varying condition, the deopt state
is no longer correct.
https://github.com/llvm/llvm-project/issues/61963 fixed.
Differential Revision: https://reviews.llvm.org/D147662
LoopPredication introduces the use of possibly posion value in branch (guard)
instruction, so to avoid introducing undefined behavior it should be frozen.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D146685
Loop predication can insert assumes to preserve knowledge about some facts that
may otherwise be lost, because loop predication is a lossy transform. When a guard
is represented as branch by widenable condition, it should insert it in the guarded
block. However, if the guarded block has other predecessors than the guard block,
then the condition might not dominate it. Currently we generate invalid code here.
One possible fix here is to split critical edge and insert the assume there, but in
this case we should modify CFG, which Loop Predication is not currently doing, and we
want to keep it that way.
The fix is to handle this case by inserting a Phi which takes `Cond` as input from the
guard block and `true` from any other blocks. This is valid in terms of IR and does
not introduce any new knowledge if we came from another block.
Differential Revision: https://reviews.llvm.org/D144859
Reviewed By: nikic, skatkov
As LoopPredication performs non-equivalent transforms removing some
checks from loops, other passes may not be able to perform transforms
they'd be able to do if the checks were left in loops.
This patch makes LoopPredication insert assumes of the replaced
conditions either after a guard call or in the true block of
widenable condition branch.
Differential Revision: https://reviews.llvm.org/D135354
Summary:
The code for generating a name for loops for various reporting scenarios
created a name by serializing the loop into a string. This may result in
a very large name for a loop containing many blocks. Use the getName()
function on the loop instead.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: Whitney (Whitney Tsang), aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D133587
With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size.
Reviewed By: aeubanks, asbirlea
Differential Revision: https://reviews.llvm.org/D129599
The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct.
This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid.
In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled.
On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE.
The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB.
Differential Revision: https://reviews.llvm.org/D112734
Upon further investigation and discussion,
this is actually the opposite direction from what we should be taking,
and this direction wouldn't solve the motivational problem anyway.
Additionally, some more (polly) tests have escaped being updated.
So, let's just take a step back here.
This reverts commit f3190dedeef9da2109ea57e4cb372f295ff53b88.
This reverts commit 749581d21f2b3f53e4fca4eb8728c942d646893b.
This reverts commit f3df87d57e096143670e0fd396e81d43393a2dd2.
This reverts commit ab1dbcecd6f0969976fafd62af34730436ad5944.
Using BPI within loop predication is non-trivial because BPI is only
preserved lossily in loop pass manager (one fix exposed by lossy
preservation is up for review at D111448). However, since loop
predication is only used in downstream pipelines, it is hard to keep BPI
from breaking for incomplete state with upstream changes in BPI.
Also, correctly preserving BPI for all loop passes is a non-trivial
undertaking (D110438 does this lossily), while the benefit of using it
in loop predication isn't clear.
In this patch, we rely on profile metadata to get almost similar benefit as
BPI, without actually using the complete heuristics provided by BPI.
This avoids the compile time explosion we tried to fix with D110438 and
also avoids fragile bugs because BPI can be lossy in loop passes
(D111448).
Reviewed-By: asbirlea, apilipenko
Differential Revision: https://reviews.llvm.org/D111668
This is analogous to D86156 (which preserves "lossy" BFI in loop
passes). Lossy means that the analysis preserved may not be up to date
with regards to new blocks that are added in loop passes, but BPI will
not contain stale pointers to basic blocks that are deleted by the loop
passes.
This is achieved through BasicBlockCallbackVH in BPI, which calls
eraseBlock that updates the data structures in BPI whenever a basic
block is deleted.
This patch does not have any changes in the upstream pipeline, since
none of the loop passes in the pipeline use BPI currently.
However, since BPI wasn't previously preserved in loop passes, the loop
predication pass was invoking BPI *on the entire
function* every time it ran in an LPM. This caused massive compile time
in our downstream LPM invocation which contained loop predication.
See updated test with an invocation of a loop-pipeline containing loop
predication and -debug-pass turned ON.
Reviewed-By: asbirlea, modimo
Differential Revision: https://reviews.llvm.org/D110438
Precommit testcase for D110438. Since we do not preserve BPI in loop
pass manager, we are forced to compute BPI everytime Loop predication is
invoked.
The patch referenced changes that behaviour by preserving lossy BPI for
loop passes.
To make the IR easier to analyze, this pass makes some minor transformations.
After that, even if it doesn't decide to optimize anything, it can't report that
it changed nothing and preserved all the analyses.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D109855
The attached testcase crashes without the patch (Not the same accesses
in the same order).
When we move instructions before another instruction, we also need to
update the memory accesses corresponding to it.
Reviewed-By: asbirlea
Differential Revision: https://reviews.llvm.org/D109197
Since LICM has now unconditionally moved to MemorySSA based form, all
passes that run in same LPM as LICM need to preserve MemorySSA (i.e. our
downstream pipeline).
Added loop-mssa to all tests and perform -verify-memoryssa within
LoopPredication itself.
Differential Revision: https://reviews.llvm.org/D108724
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.
This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.
Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.
Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
Blindly following unique-successors chain appeared to be a bad idea.
In a degenerate case when block jumps to itself that goes into endless loop.
Discovered this problem when playing with additional changes,
managed to reproduce it on existing LoopPredication code.
Fix by checking a "visited" set while iterating through unique successors.
Reviewed By: skatkov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72908
We may end up with a case where we have a widenable branch above the loop, but not all widenable branches within the loop have been removed. Since a widenable branch inhibit SCEVs ability to reason about exit counts (by design), we have a tradeoff between effectiveness of this optimization and allowing future widening of the branches within the loop. LoopPred is thought to be one of the most important optimizations for range check elimination, so let's pay the cost.
As a reminder, a "widenable branch" is the pattern "br i1 (and i1 X, WC()), label %taken, label %untaken" where "WC" is the widenable condition intrinsics. The semantics of such a branch (derived from the semantics of WC) is that a new condition can be added into the condition arbitrarily without violating legality.
Broaden the definition in two ways:
Allow swapped operands to the br (and X, WC()) form
Allow widenable branch w/trivial condition (i.e. true) which takes form of br i1 WC()
The former is just general robustness (e.g. for X = non-instruction this is what instcombine produces). The later is specifically important as partial unswitching of a widenable range check produces exactly this form above the loop.
Differential Revision: https://reviews.llvm.org/D70502
Unswitch (and other loop transforms) like to generate loop exit blocks with unconditional successors, and phi nodes (LCSSA, or simple multiple exiting blocks sharing an exit). Generalize the "likely very rare exit" check slightly to handle this form.
This implements a version of the predicateLoopExits transform from IndVarSimplify extended to exploit widenable conditions - and thus be much wider in scope of legality. The code structure ends up being almost entirely different, so I chose to duplicate this into the LoopPredication pass instead of trying to reuse the code in the IndVars.
The core notions of the transform are as follows:
If we have a widenable condition which controls entry into the loop, we're allowed to widen it arbitrarily. Given that, it's simply a *profitability* question as to what conditions to fold into the widenable branch.
To avoid pass ordering issues, we want to avoid widening cases that would otherwise be dischargeable. Or... widen in a form which can still be discharged. Thus, we phrase the transform as selecting one analyzeable exit from the set of analyzeable exits to keep. This avoids creating pass ordering complexities.
Since none of the above proves that we actually exit through our analyzeable exits - we might exit through something else entirely - we limit ourselves to cases where a) the latch is analyzeable and b) the latch is predicted taken, and c) the exit being removed is statically cold.
Differential Revision: https://reviews.llvm.org/D69830