81 Commits

Author SHA1 Message Date
annamthomas
46c2d93662
[StandardInstrumentation] Annotate loops with the function name (#90756)
When analyzing pass debug output it is helpful to have the function name
along with the loop name.
2024-05-03 14:13:59 -04:00
Philip Reames
ffb2af3ed6
[SCEVExpander] Attempt to reinfer flags dropped due to CSE (#72431)
LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that, it needs
to strip any poison generating flags (nsw, nuw, exact, nneg, etc..)
which may not be valid for the newly added users.

This is conservatively correct, but has the effect that LSR will strip
nneg flags on zext instructions involved in trip counts in loop
preheaders. To avoid this, this patch adjusts the expanded to reinfer
the flags on the CSE candidate if legal for all possible users.

This should fix the regression reported in
https://github.com/llvm/llvm-project/issues/71200.

This should arguably be done inside canReuseInstruction instead, but
doing it outside is more conservative compile time wise. Both
canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so
right now we are performing work which is roughly O(N^2) in the size of
the operand graph. We should fix that before making the per operand step
more expensive. My tenative plan is to land this, and then rework the
code to sink the logic into more core interfaces.
2023-12-07 13:20:36 -08:00
Aleksandr Popov
e9dfe083f0
[GuardUtils] Revert llvm::isWidenableBranch change (#66411)
In the d6e7c162e1df3736d8e2b3610a831b7cfa5be99b was introduced util to
to extract widenable conditions from branch. That util was applied in
the llvm::isWidenableBranch to check if branch is widenable. So we
consider branch is widenable if it has widenable condition anywhere in
the condition tree. But that will be true when we finish GuardWidening
reworking from branch widening to widenable conditions widening.
For now we still need to check that widenable branch is in the form of:
`br(widenable_condition & (...))`,
because that form is assumed by LoopPredication and GuardWidening
algorithms.

Fixes: https://github.com/llvm/llvm-project/issues/66418

Co-authored-by: Aleksander Popov <apopov@azul.com>
2023-09-20 11:27:54 +02:00
Danila Malyutin
a668c0f687
[LoopPredication] Fix division by zero in case of zero branch weights (#66506)
Treat the case where all branch weights are zero as if there was no
profile.
Fixes #66382
2023-09-19 04:38:29 +03:00
Danila Malyutin
e80a8b4ab6 [NFC] Add test for #66382 2023-09-14 21:10:00 +03:00
Aleksandr Popov
1b87882228 [LoopPredication] Rework assumes of widened conditions
Currently after widening br(WC && (c1 && c2)) we insert assume of
(c1 && c2) which is joined to WC by And operation.
But we are going to support more flexible form of widenable branches
where WC could be placed arbitrary in the expression tree, e.g:
br(c1 && (c2 && WC)).
In that case we won't have (c1 && c2) in the IR. So we need to add
explicit (c1 && c2) and then create an assumption of it.

Reviewed By: anna

Differential Revision: https://reviews.llvm.org/D157502
2023-08-18 14:35:46 +02:00
Aleksandr Popov
f3016c380b Revert "[NFC][LoopPredication] Add parsed checks logging"
This reverts commit aa603c41caab63e246f4a4258c8b96e6ea06fdc9.

Revert due to LLVM Buildbot failure
2023-08-10 12:55:55 +02:00
Aleksandr Popov
aa603c41ca [NFC][LoopPredication] Add parsed checks logging
Differential Revision: https://reviews.llvm.org/D157491
2023-08-10 12:50:32 +02:00
Serguei Katkov
116a31e3c3 [GuardUtils] Allow intermmediate blocks between widenable branch and deopt block
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D151082
2023-05-25 16:28:11 +07:00
Anna Thomas
27f8a62a54 [LoopPredication] Fix where we generate widened condition. PR61963
Loop predication's predicateLoopExit pass does two incorrect things:

It sinks the widenable call into the loop, thereby converting an invariant condition to a variant one
It widens the widenable call at a branch thereby converting the branch into a loop-varying one.

The latter is problematic when the branch may have been loop-invariant
and prior optimizations (such as indvars) may have relied on this
fact, and updated the deopt state accordingly.

Now, when we widen this with a loop-varying condition, the deopt state
is no longer correct.
https://github.com/llvm/llvm-project/issues/61963 fixed.

Differential Revision: https://reviews.llvm.org/D147662
2023-04-10 10:37:05 -04:00
Anna Thomas
975cc76020 Simplify test with deopt state in D147662. NFC 2023-04-10 10:37:04 -04:00
Anna Thomas
b3cabfdb83 Precommit test from D147662 2023-04-06 14:35:27 -04:00
Serguei Katkov
99da317331 [LoopPredication] Fix the LoopPredication by feezing the result of predication.
LoopPredication introduces the use of possibly posion value in branch (guard)
instruction, so to avoid introducing undefined behavior it should be frozen.

Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D146685
2023-03-29 15:12:00 +07:00
Max Kazantsev
0858b5369b [Test] Regenerate checks in test file 2023-03-24 15:43:02 +07:00
Serguei Katkov
c96269055f [LoopPredication] Add a test demonstrating bug.
LoopPredication may introduce undefined behavior.
2023-03-22 18:14:00 +07:00
Max Kazantsev
a18ce47a3e [LoopPredication] Account for critical edges when inserting assumes. PR26496
Loop predication can insert assumes to preserve knowledge about some facts that
may otherwise be lost, because loop predication is a lossy transform. When a guard
is represented as branch by widenable condition, it should insert it in the guarded
block. However, if the guarded block has other predecessors than the guard block,
then the condition might not dominate it. Currently we generate invalid code here.

One possible fix here is to split critical edge and insert the assume there, but in
this case we should modify CFG, which Loop Predication is not currently doing, and we
want to keep it that way.

The fix is to handle this case by inserting a Phi which takes `Cond` as input from the
guard block and `true` from any other blocks. This is valid in terms of IR and does
not introduce any new knowledge if we came from another block.

Differential Revision: https://reviews.llvm.org/D144859
Reviewed By: nikic, skatkov
2023-02-27 18:26:17 +07:00
Max Kazantsev
3bfb2357f7 [Test] Add failing test for PR61022
Details: https://github.com/llvm/llvm-project/issues/61022
2023-02-27 17:20:01 +07:00
Nikita Popov
ba89c66771 [LoopPredication] Convert tests to opaque pointers (NFC) 2023-01-02 16:52:03 +01:00
Roman Lebedev
e23e1594d8
[NFC] Port all LoopPredication tests to -passes= syntax 2022-12-08 02:38:46 +03:00
Dmitry Makogon
a580d2e430 [Test] Update tests for LoopPredication constant ranges widening 2022-11-29 14:09:47 +07:00
Dmitry Makogon
4b0fd43512 [Test] Add tests with range checks with known constant ranges
LoopPredication might be able to turn such checks (which are
not necessarily are done on IV) into loop invariant checks.
2022-11-08 18:54:43 +07:00
Arthur Eubanks
f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Dmitry Makogon
8307f6c854 [LoopPredication] Insert assumes of conditions of predicated guards
As LoopPredication performs non-equivalent transforms removing some
checks from loops, other passes may not be able to perform transforms
they'd be able to do if the checks were left in loops.

This patch makes LoopPredication insert assumes of the replaced
conditions either after a guard call or in the true block of
widenable condition branch.

Differential Revision: https://reviews.llvm.org/D135354
2022-10-07 16:10:24 +07:00
Dmitry Makogon
6474d7faea [Test] Add test showing missed branch elimination due to loop predication transform 2022-10-06 17:34:09 +07:00
Jamie Schmeiser
5e3ac79690 Loop names used in reporting can grow very large
Summary:
The code for generating a name for loops for various reporting scenarios
created a name by serializing the loop into a string.  This may result in
a very large name for a loop containing many blocks.  Use the getName()
function on the loop instead.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: Whitney (Whitney Tsang), aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D133587
2022-09-09 13:45:14 -04:00
Ruobing Han
f756f06cc4 [SimpleLoopUnswitch] Skip non-trivial unswitching of cold loops
With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size.

Reviewed By: aeubanks, asbirlea

Differential Revision: https://reviews.llvm.org/D129599
2022-08-08 18:12:04 +00:00
Philip Reames
8906a0fe64 [SCEVExpander] Drop poison generating flags when reusing instructions
The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct.

This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid.

In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled.

On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE.

The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB.

Differential Revision: https://reviews.llvm.org/D112734
2021-11-29 15:23:34 -08:00
Roman Lebedev
b291597112
Revert rest of IRBuilderBase's short-circuiting folds
Upon further investigation and discussion,
this is actually the opposite direction from what we should be taking,
and this direction wouldn't solve the motivational problem anyway.

Additionally, some more (polly) tests have escaped being updated.
So, let's just take a step back here.

This reverts commit f3190dedeef9da2109ea57e4cb372f295ff53b88.
This reverts commit 749581d21f2b3f53e4fca4eb8728c942d646893b.
This reverts commit f3df87d57e096143670e0fd396e81d43393a2dd2.
This reverts commit ab1dbcecd6f0969976fafd62af34730436ad5944.
2021-10-28 02:15:14 +03:00
Roman Lebedev
f3190dedee
[IR] IRBuilderBase::CreateAnd(): short-circuit x & 0 --> 0
https://alive2.llvm.org/ce/z/YzPhSb

Refs. https://reviews.llvm.org/D109368#3089809
2021-10-27 18:01:06 +03:00
Roman Lebedev
749581d21f
[IR] IRBuilderBase::CreateAnd(): fix short-circuiting for constant on LHS
Refs. https://reviews.llvm.org/D109368#3089809
2021-10-27 18:01:06 +03:00
Roman Lebedev
5a8a7b3bf8
[NFC] Re-autogenerate check lines in some tests to ease of future update 2021-10-27 18:01:05 +03:00
Anna Thomas
9403514e76 [LoopPredication] Calculate profitability without BPI
Using BPI within loop predication is non-trivial because BPI is only
preserved lossily in loop pass manager (one fix exposed by lossy
preservation is up for review at D111448). However, since loop
predication is only used in downstream pipelines, it is hard to keep BPI
from breaking for incomplete state with upstream changes in BPI.
Also, correctly preserving BPI for all loop passes is a non-trivial
undertaking (D110438 does this lossily), while the benefit of using it
in loop predication isn't clear.

In this patch, we rely on profile metadata to get almost similar benefit as
BPI, without actually using the complete heuristics provided by BPI.
This avoids the compile time explosion we tried to fix with D110438 and
also avoids fragile bugs because BPI can be lossy in loop passes
(D111448).

Reviewed-By: asbirlea, apilipenko
Differential Revision: https://reviews.llvm.org/D111668
2021-10-19 14:24:04 -04:00
Anna Thomas
452714f8f8 [BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults
This is analogous to D86156 (which preserves "lossy" BFI in loop
passes). Lossy means that the analysis preserved may not be up to date
with regards to new blocks that are added in loop passes, but BPI will
not contain stale pointers to basic blocks that are deleted by the loop
passes.

This is achieved through BasicBlockCallbackVH in BPI, which calls
eraseBlock that updates the data structures in BPI whenever a basic
block is deleted.

This patch does not have any changes in the upstream pipeline, since
none of the loop passes in the pipeline use BPI currently.
However, since BPI wasn't previously preserved in loop passes, the loop
predication pass was invoking BPI *on the entire
function* every time it ran in an LPM.  This caused massive compile time
in our downstream LPM invocation which contained loop predication.

See updated test with an invocation of a loop-pipeline containing loop
predication and -debug-pass turned ON.

Reviewed-By: asbirlea, modimo
Differential Revision: https://reviews.llvm.org/D110438
2021-09-30 10:27:05 -04:00
Anna Thomas
03ce0841da Add profile count. Regenerate check lines. NFC
Function profile counts added to test cases. Regenerated test lines for
loop predication test.
2021-09-28 15:33:49 -04:00
Anna Thomas
90fb73aa73 [LoopPred Test] Fix lld-x86_64-win BB failure
Need a more general CHECK line for testcase in 5df9112 for correctly
handling  lld-x86_64-win buildbot.
2021-09-27 21:28:46 -04:00
Anna Thomas
5df9112ce3 Reland "[LoopPredication] Add testcase showing BPI computation. NFC"
This relands commit 16a62d4f.
Relanded after fixing CHECK-LINES for opt pipeline output to be more
general (based on failures seen in buildbot).
2021-09-27 21:15:46 -04:00
Anna Thomas
a0a9e3e05f Revert "[LoopPredication] Add testcase showing BPI computation. NFC"
This reverts commit 16a62d4f3dca189b0e0565c7ebcd83ddfcc67629.

Needs some update to check lines to fix bb failure.
2021-09-27 17:08:57 -04:00
Anna Thomas
16a62d4f3d [LoopPredication] Add testcase showing BPI computation. NFC
Precommit testcase for D110438. Since we do not preserve BPI in loop
pass manager, we are forced to compute BPI everytime Loop predication is
invoked.
The patch referenced changes that behaviour by preserving lossy BPI for
loop passes.
2021-09-27 16:54:22 -04:00
Daniil Suchkov
fe950cba8f Update LoopPredication test to fix buildbot failure.
This patch updates tests added in 5f2b7879f16ad5023f0684febeb0a20f7d53e4a8.
2021-09-16 23:37:59 +00:00
Daniil Suchkov
0e36288318 [LoopPredication] Report changes correctly when attempting loop exit predication
To make the IR easier to analyze, this pass makes some minor transformations.
After that, even if it doesn't decide to optimize anything, it can't report that
it changed nothing and preserved all the analyses.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D109855
2021-09-16 22:49:55 +00:00
Daniil Suchkov
5f2b7879f1 NFC. Add tests exposing missing analysis invalidation in LoopPredication. 2021-09-16 22:49:55 +00:00
Anna Thomas
f661ce209f [LoopPredication] Fix MemorySSA crash in predicateLoopExits
The attached testcase crashes without the patch (Not the same accesses
in the same order).

When we move instructions before another instruction, we also need to
update the memory accesses corresponding to it.

Reviewed-By: asbirlea
Differential Revision: https://reviews.llvm.org/D109197
2021-09-02 21:26:07 -04:00
Anna Thomas
55bdb14026 [LoopPredication] Preserve MemorySSA
Since LICM has now unconditionally moved to MemorySSA based form, all
passes that run in same LPM as LICM need to preserve MemorySSA (i.e. our
downstream pipeline).

Added loop-mssa to all tests and perform -verify-memoryssa within
LoopPredication itself.

Differential Revision: https://reviews.llvm.org/D108724
2021-08-26 11:36:25 -04:00
Roman Lebedev
b46c085d2b
[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.

This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.

Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.

Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
2021-03-06 21:52:46 +03:00
Fangrui Song
f31811f2dc [BasicAA] Rename deprecated -basicaa to -basic-aa
Follow-up to D82607
Revert an accidental change (empty.ll) of D82683
2020-06-26 20:41:37 -07:00
Fedor Sergeev
cc7cb05e9d [BasicBlock] fix looping in getPostdominatingDeoptimizeCall
Blindly following unique-successors chain appeared to be a bad idea.
In a degenerate case when block jumps to itself that goes into endless loop.

Discovered this problem when playing with additional changes,
managed to reproduce it on existing LoopPredication code.

Fix by checking a "visited" set while iterating through unique successors.

Reviewed By: skatkov

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72908
2020-01-17 15:40:02 +03:00
Philip Reames
dfb7a9091a [LoopPred] Robustly handle partially unswitched loops
We may end up with a case where we have a widenable branch above the loop, but not all widenable branches within the loop have been removed.  Since a widenable branch inhibit SCEVs ability to reason about exit counts (by design), we have a tradeoff between effectiveness of this optimization and allowing future widening of the branches within the loop.  LoopPred is thought to be one of the most important optimizations for range check elimination, so let's pay the cost.
2019-11-21 15:44:36 -08:00
Philip Reames
aaea24802b Broaden the definition of a "widenable branch"
As a reminder, a "widenable branch" is the pattern "br i1 (and i1 X, WC()), label %taken, label %untaken" where "WC" is the widenable condition intrinsics. The semantics of such a branch (derived from the semantics of WC) is that a new condition can be added into the condition arbitrarily without violating legality.

Broaden the definition in two ways:
    Allow swapped operands to the br (and X, WC()) form
    Allow widenable branch w/trivial condition (i.e. true) which takes form of br i1 WC()

The former is just general robustness (e.g. for X = non-instruction this is what instcombine produces). The later is specifically important as partial unswitching of a widenable range check produces exactly this form above the loop.

Differential Revision: https://reviews.llvm.org/D70502
2019-11-21 10:46:16 -08:00
Philip Reames
f3eb5dee57 [LoopPred] Generalize profitability check to handle unswitch output
Unswitch (and other loop transforms) like to generate loop exit blocks with unconditional successors, and phi nodes (LCSSA, or simple multiple exiting blocks sharing an exit).  Generalize the "likely very rare exit" check slightly to handle this form.
2019-11-19 14:06:36 -08:00
Philip Reames
ad5a84c883 [LoopPred/WC] Use a dominating widenable condition to remove analyze loop exits
This implements a version of the predicateLoopExits transform from IndVarSimplify extended to exploit widenable conditions - and thus be much wider in scope of legality. The code structure ends up being almost entirely different, so I chose to duplicate this into the LoopPredication pass instead of trying to reuse the code in the IndVars.

The core notions of the transform are as follows:

    If we have a widenable condition which controls entry into the loop, we're allowed to widen it arbitrarily. Given that, it's simply a *profitability* question as to what conditions to fold into the widenable branch.
    To avoid pass ordering issues, we want to avoid widening cases that would otherwise be dischargeable. Or... widen in a form which can still be discharged. Thus, we phrase the transform as selecting one analyzeable exit from the set of analyzeable exits to keep. This avoids creating pass ordering complexities.
    Since none of the above proves that we actually exit through our analyzeable exits - we might exit through something else entirely - we limit ourselves to cases where a) the latch is analyzeable and b) the latch is predicted taken, and c) the exit being removed is statically cold.

Differential Revision: https://reviews.llvm.org/D69830
2019-11-18 11:23:29 -08:00