189 Commits

Author SHA1 Message Date
Rong Xu
666731660c [Pass][CHR] Move ControlHeightReduction to module optimization pipeline
This is a modified version of commit b374423304a8 by
Arthur (https://reviews.llvm.org/D143424).

Here we invoke to the pass independent of PGOOPT. We now check if the
profile is available through the program summary. This ensures CHR is
called in distributed ThinLTO BE compilation (where PGOOPT might not
be created).

Differential Revision: https://reviews.llvm.org/D144769
2023-02-27 11:47:54 -08:00
Arthur Eubanks
a628ca4925 Revert "[Pipeline] Move ControlHeightReduction to module optimization pipeline"
This reverts commit b374423304a8d91d590d0ce5ab1b381296d6dfb2.

Causes regressions on some benchmarks.
2023-02-23 10:17:12 -08:00
Arthur Eubanks
b374423304 [Pipeline] Move ControlHeightReduction to module optimization pipeline
This pass isn't a simplification, it's a non-canonical optimization.

This makes it only run once in a (Thin)LTO pipeline during postlink, just like all the other optimization pipeline passes.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D143424
2023-02-16 15:23:38 -08:00
Arthur Eubanks
5f5cf60298 [Pipeline] Remove -enable-npm-O3-nontrivial-unswitch flag
This was added to help debugging performance issues, no longer needed.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98675
2023-02-16 11:36:03 -08:00
Arthur Eubanks
4d16ebd6c9 [Pipeline] Remove -enable-no-rerun-simplification-pipeline flag
This has been on without complaint for a while.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D144130
2023-02-16 11:29:51 -08:00
David Green
86bfeb906e Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner."
This seems to cause large regressions in existing code, as much as 75% slower
(4x the time taken). Small always inline functions seem to be used a lot in the
cmsis-dsp library.

I would add a phase ordering test to show the problems, but one already exists!
The llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll was just changed by
removing alwaysinline to hide the problems that existed.

This reverts commit cae033dcf227aeecf58fca5af6fc7fde1fd2fb4f.
This reverts commit 8e33c41e72ad42e4c27f8cbc3ad2e02b169637a1.
2023-02-10 15:01:49 +00:00
Amara Emerson
8e33c41e72 Inliner: Address missed review comments for D143624 2023-02-09 21:56:40 -08:00
Amara Emerson
cae033dcf2 Inlining: Run the legacy AlwaysInliner before the regular inliner.
We have several situations where it's beneficial for code size to ensure that every
call to always-inline functions are inlined before normal inlining decisions are
made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this,
it only does it on a per-SCC basis, rather than the whole module. Ensuring that
all mandatory inlinings are done before any heuristic based decisions are made
just makes sense.

Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary
for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0.

This also fixes an exponential compile time blow up in
https://github.com/llvm/llvm-project/issues/59126

Differential Revision: https://reviews.llvm.org/D143624
2023-02-09 16:49:29 -08:00
Florian Hahn
8028263c41
Recommit "[ConstraintElim] Enable pass by default."
This reverts commit 695ce48c63ec582a46bfbda9b066f4d3bcde143f.

The compile-time regression causing the revert has been fixed. Recommit
the original patch.

Original commit message:

   The pass should help to close a functional gap when it comes to
    reasoning about related conditions in a relatively general way.

    It addresses multiple existing issues (linked below) and the need for a
    more powerful reasoning system was also discussed recently in
    https://discourse.llvm.org/t/rfc-alternative-approach-of-dealing-with-implications-from-comparisons-through-pos-analysis/65601/7

    On AArch64, the new pass performs ~2000 simplifications on
    MultiSource,SPEC2006,SPEC2017 with -O3.

    Compile-time impact:

    NewPM-O3: +0.20%
    NewPM-ReleaseThinLTO: +0.32%
    NewPM-ReleaseLTO-g: +0.28%

    https://llvm-compile-time-tracker.com/compare.php?from=f01a3a893c147c1594b9a3fbd817456b209dabbf&to=577688758ef64fb044215ec3e497ea901bb2db28&stat=instructions:u

    Fixes #49344.
    Fixes #47888.
    Fixes #48253.
    Fixes #49229.
    Fixes #58074.

    Reviewed By: asbirlea

    Differential Revision: https://reviews.llvm.org/D135915
2023-02-06 18:09:43 +00:00
Steven Wu
516e301752 [NFC][Profile] Access profile through VirtualFileSystem
Make the access to profile data going through virtual file system so the
inputs can be remapped. In the context of the caching, it can make sure
we capture the inputs and provided an immutable input as profile data.

Reviewed By: akyrtzi, benlangmuir

Differential Revision: https://reviews.llvm.org/D139052
2023-02-01 09:25:02 -08:00
Arthur Eubanks
4ce34bb2a9 [CGSCC] Add pass which counts the max number of times we visit a function
This will help with finding potential pathological CGSCC cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D142853
2023-01-30 10:06:53 -08:00
Joseph Huber
6185246f4f [OpenMP] Run an extra 'OpenMPOpt' pass in LTO-mode
The `OpenMPOpt` pass is pivotal to the performance of many OpenMP
offloading programs. When we perform non-LTO builds with OpenMP we used
to link the OpenMP deviceRTL individually for each TU. This lead to us
getting an additional attributor run on the combined runtime and user
code. When we used LTO we lost a run and suffered a large performance
degradation. This patch simply adds in the extra `OpenMPOpt` pass that
we miss into the LTO pipeline. This patch fixes the performance
regression shown in applications that used OpenMP offloading in LTO
mode.

Previously, this wasn't legal to do as we could emit new runtime calls
into the module. That was fixed by D142646.

Depends on D142646

Fixes https://github.com/llvm/llvm-project/issues/60300

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142650
2023-01-26 13:23:45 -06:00
Joseph Huber
0bdde9dfb9 [OpenMP] Make OpenMPOpt aware of the OpenMP runtime's status
The `OpenMPOpt` pass contains optimizations that generate new calls into
the OpenMP runtime. This causes problems if we are in a state where the
runtime has already been linked statically. Generating these new calls
will result in them never being resolved. We should indicate if we are
in a "post-link" LTO phase and prevent OpenMPOpt from generating new
runtime calls.

Generally, it's not desireable for passes to maintain state about the
context in which they're called. But this is the only reasonable
solution to static linking when we have a pass that generates new
runtime calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142646
2023-01-26 13:23:44 -06:00
Florian Hahn
695ce48c63
Revert "[ConstraintElim] Enable pass by default."
This reverts commit fb13dcf3431cd83911fe56899d2fade808dc5b8d.

A large compile-time regression for code generated by sanitizers has
been reported. Revert while I investigate the issue. Details and
reproducers are available here: https://reviews.llvm.org/D135915
2023-01-18 14:25:00 +00:00
Alexandros Lamprineas
572a757fa7 [IPSCCP] Enable specialization of functions.
Re-enable the optimization after having fixed the compilation error
found in SPEC/CINT2017rate/502.gcc_r when both LTO and PGO are in use
(see https://reviews.llvm.org/D141474).

Differential Revision: https://reviews.llvm.org/D140210
2023-01-13 14:04:17 +00:00
Florian Hahn
fb13dcf343
[ConstraintElim] Enable pass by default.
The pass should help to close a functional gap when it comes to
reasoning about related conditions in a relatively general way.

It addresses multiple existing issues (linked below) and the need for a
more powerful reasoning system was also discussed recently in
https://discourse.llvm.org/t/rfc-alternative-approach-of-dealing-with-implications-from-comparisons-through-pos-analysis/65601/7

On AArch64, the new pass performs ~2000 simplifications on
MultiSource,SPEC2006,SPEC2017 with -O3.

Compile-time impact:

NewPM-O3: +0.20%
NewPM-ReleaseThinLTO: +0.32%
NewPM-ReleaseLTO-g: +0.28%

https://llvm-compile-time-tracker.com/compare.php?from=f01a3a893c147c1594b9a3fbd817456b209dabbf&to=577688758ef64fb044215ec3e497ea901bb2db28&stat=instructions:u

Fixes #49344.
Fixes #47888.
Fixes #48253.
Fixes #49229.
Fixes #58074.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D135915
2023-01-04 18:00:37 +00:00
Florian Hahn
f3c1d92682
[ConstraintElim] Adjust position in LTO pipeline.
This runs ConstraintElim earlier during LTO, similar to non-LTO.
Discussed and split off from D135915.
2023-01-03 17:07:43 +00:00
Florian Hahn
9e6d2c82d6
[ConstraintElim] Move after first instcombine run.
Running ConstraintEliminiation after the first InstCombine run results
in slightly more simplifications on average.

There are is a tiny number of regressions, mostly due to CVP eliminating
a condition that ConstraintElimination would use, but in most cases
there's a slight improvement or no change.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D140853
2023-01-03 13:25:00 +00:00
Florian Hahn
60359f56aa
Revert "[IPSCCP] Enable specialization of functions."
This reverts commit 2656572d485127cc30b8fe9752024d2a0f1c50db.

It looks like CINT2017rate/502.gcc_r gets mis-compiled with LTO + PGO on
AArch64 with function specialization.
2022-12-26 16:02:59 +00:00
Alexandros Lamprineas
2656572d48 [IPSCCP] Enable specialization of functions.
This patch enables Function Specialization by default at all
optimization levels except Os, Oz.

Compilation Time Overhead:
--------------------------
Measured the Instruction Count increase (Geomean) for CTMark from
the llvm-testsuite as in https://llvm-compile-time-tracker.com.
 * {-O3, Non-LTO}: +0.136% Instruction Count
 * {-O3, LTO}: +0.346% Instruction Count

Performance Uplift:
-------------------
Measured +9.121% score increase for 505.mcf_r from SPEC Int 2017
(Tested on Neoverse N1 with -O3 + LTO)

Correctness Testing:
--------------------
 * Passes bootstrap Clang with ASAN + LTO + FuncSpec aggressive options:
   { MaxClonesThreshold=10,
     SmallFunctionThreshold=10,
     AvgLoopIterationCount=30,
     SpecializeOnAddresses=true,
     EnableSpecializationForLiteralConstant=true,
     FuncSpecializationMaxIters=10 }
 * Builds Chromium and passes its unittests with the above options + ThinLTO.

For more info please refer to
https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518

Differential Revision: https://reviews.llvm.org/D140210
2022-12-25 10:05:21 +02:00
Alexandros Lamprineas
8136a0172b [FuncSpec] Make the Function Specializer part of the IPSCCP pass.
Reland 877a9f9abec61f06e39f1cd872e37b828139c2d1 since D138654 (parent)
has been fixed with 9ebaf4fef4aac89d4eff08e48185d61bc893f14e and with
8f1e11c5a7d70f96943a72649daa69f152d73e90.

Differential Revision: https://reviews.llvm.org/D126455
2022-12-10 14:39:49 +00:00
Roman Lebedev
4f7e5d2206
[SROA] For non-speculatable loads of selects -- split block, insert then/else blocks, form two-entry PHI node, take 2
Currently, SROA is CFG-preserving.
Not doing so does not affect any pipeline test. (???)
Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call.

By design, we can't really SROA alloca if their address escapes somehow,
but we have logic to deal with `load` of `select`/`PHI`,
where at least one of the possible addresses prevents promotion,
by speculating the `load`s and `select`ing between loaded values.

As one would expect, that requires ensuring that the speculation is actually legal.
Even ignoring complexity bailouts, that logic does not deal with everything,
e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`.
There can also be cases where the load is genuinely non-speculate.

So if we can't prove that the load can be speculated,
unfold the select, produce two-entry phi node, and perform predicated load.

Now, that transformation must obviously update Dominator Tree,
since we require it later on. Doing so is trivial.
Additionally, we don't want to do this for the final SROA invocation (D136806).

In the end, this ends up having negative (!) compile-time cost:
https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u

Though indeed, this only deals with `select`s, `PHI`s are still using speculation.

Should we update some more analysis?

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138238

This reverts commit 739611870d3b06605afe25cc07833f6a62de9545,
and recommits 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5
with a fixed assertion - we should check that DTU is there,
not just assert false...
2022-12-08 20:19:55 +03:00
Roman Lebedev
739611870d
Revert "[SROA] For non-speculatable loads of selects -- split block, insert then/else blocks, form two-entry PHI node"
The assertion about not modifying the CFG seems to not hold,
will recommit in a bit.

https://lab.llvm.org/buildbot#builders/139/builds/32412

This reverts commit 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5.
This reverts commit 4f90f4ada33718f9025d0870a4fe3fe88276b3da.
2022-12-08 19:51:15 +03:00
Roman Lebedev
03e6d9d9d1
[SROA] For non-speculatable loads of selects -- split block, insert then/else blocks, form two-entry PHI node
Currently, SROA is CFG-preserving.
Not doing so does not affect any pipeline test. (???)
Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call.

By design, we can't really SROA alloca if their address escapes somehow,
but we have logic to deal with `load` of `select`/`PHI`,
where at least one of the possible addresses prevents promotion,
by speculating the `load`s and `select`ing between loaded values.

As one would expect, that requires ensuring that the speculation is actually legal.
Even ignoring complexity bailouts, that logic does not deal with everything,
e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`.
There can also be cases where the load is genuinely non-speculate.

So if we can't prove that the load can be speculated,
unfold the select, produce two-entry phi node, and perform predicated load.

Now, that transformation must obviously update Dominator Tree,
since we require it later on. Doing so is trivial.
Additionally, we don't want to do this for the final SROA invocation (D136806).

In the end, this ends up having negative (!) compile-time cost:
https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u

Though indeed, this only deals with `select`s, `PHI`s are still using speculation.

Should we update some more analysis?

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138238
2022-12-08 16:51:32 +03:00
Alexandros Lamprineas
0f0cb92cb2 Revert "[FuncSpec] Make the Function Specializer part of the IPSCCP pass."
This reverts commit 877a9f9abec61f06e39f1cd872e37b828139c2d1.

It depends on the parent revision 42c2dc401742266da3e0251b6c1ca491f4779963
which needs to be reverted as it broke some buildbots, so reverting both.
2022-12-08 12:41:43 +00:00
Alexandros Lamprineas
877a9f9abe [FuncSpec] Make the Function Specializer part of the IPSCCP pass.
The aim of this patch is to minimize the compilation time overhead of
running Function Specialization. It is about 40% slower to run as a
standalone pass (IPSCCP + FuncSpec vs IPSCCP with FuncSpec) according
to my measurements. I compiled the llvm testsuite with NewPM-O3 + LTO
and measured single threaded [user + system] time of IPSCCP and FuncSpec
by passing the '-time-passes' option to lld. Then I compared the two
configurations in terms of Instruction Count of the total compilation
(not of the individual passes) as in https://llvm-compile-time-tracker.com.
Geomean for non-LTO builds is -0.25% and LTO is -0.5% approximately.

You can find more info below:

https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518

Differential Revision: https://reviews.llvm.org/D126455
2022-12-08 12:14:27 +00:00
Sjoerd Meijer
8250180238 Revert "Recommit "[LoopFlatten] Enable it by default""
This reverts commit 3ea6a9a469fde168c527b1c34c09f6d684ec86af because of the
reported miscompilation in: https://github.com/llvm/llvm-project/issues/59339
2022-12-05 15:14:12 +00:00
Sjoerd Meijer
3ea6a9a469 Recommit "[LoopFlatten] Enable it by default"
The problem in 58441 that was reported after enabling this last time was fixed
in 8e9e22f07bcbe2ee95478684cf31948370e4e51e.
2022-11-29 10:45:13 +00:00
Rong Xu
6327d263f5 [CHR] Add a threshold for the code duplication
ControlHeightReduction (CHR) clones the code region to reduce the
branches in the hot code path. The number of clones is linear to the
depth of the region.

Currently it does not have control over the code size increase. We are
seeing one ~9000 BB functions get expanded to ~250000 BBs, an 25x
increase. This creates a big compile time issue for the downstream
optimizations.

This patch adds a cap for number of clones for one region.

Differential Revision: https://reviews.llvm.org/D138333
2022-11-22 11:36:40 -08:00
Sanjay Patel
163bb6d64e [Passes][VectorCombine] enable early run generally and try load folds
An early run of VectorCombine was added with D102496 specifically to
deal with unnecessary vector ops produced with the C matrix extension.
This patch is proposing to try those folds in general and add a pair
of load folds to the menu.

The load transform will partly solve (see PhaseOrdering diffs) a
longstanding vectorization perf bug by removing redundant loads via GVN:
issue #17113

The main reason for not enabling the extra pass generally in the initial
patch was compile-time cost. The cost of VectorCombine was significantly
(surprisingly) improved with:
87debdadaf18
https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u

...so the extra run is going to cost very little now - the total cost of
the 2 runs should be less than the 1 run before that micro-optimization:
https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u

It may be possible to reduce the cost slightly more with a few more
earlier-exits like that, but it's probably in the noise based on timing
experiments.

Differential Revision: https://reviews.llvm.org/D138353
2022-11-21 13:57:55 -05:00
Sanjay Patel
8f337f8ffe [VectorCombine] generalize pass param name for early combines; NFC
The option was added with https://reviews.llvm.org/D102496,
and currently the name is accurate, but I am hoping to add
a load transform that is not a scalarization. See issue #17113.
2022-11-21 13:57:55 -05:00
Roman Lebedev
8adfa29706
[Pipelines] Introduce SROA after (final, run-time) loop unrolling
Now that we are done with loop unrolling, be it either by LoopVectorizer,
or LoopUnroll passes, some variable-offset GEP's into alloca's could have
become constant-offset, thus enabling SROA and alloca promotion,
yet we don't capitalize on that, which is surprizing.

While it would be good to not introduce one more SROA invocation,
but instead move the one from `PassBuilder::buildFunctionSimplificationPipeline()`,
the existing test coverage says that is a bad idea,
though it would be fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=b150d34c47efbd8fa09604bce805c0920360f8d7&to=5a9a5c855158b482552be8c7af3e73d67fa44805&stat=instructions

So instead, i add yet another SROA run.
I have checked, and it needs to be at least after said final loop unrolling.
This is still fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=70324cd88328c0924e605fa81b696572560aa5c9&to=fb489bbef687ad821c3173a931709f9cad9aee8a&stat=instructions

I've encountered this in a real code, `SROA-after-final-loop-unrolling.ll` has been reduced from https://godbolt.org/z/fsdMhETh3

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D136806
2022-11-17 21:31:30 +03:00
Arthur Eubanks
cbcf123af2 [LegacyPM] Remove cl::opts controlling optimization pass manager passes
Move these to the new PM if they're used there.

Part of removing the legacy pass manager for optimization pipeline.

Reland with UseNewGVN usage in clang removed.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D137915
2022-11-14 09:38:17 -08:00
Arthur Eubanks
d7c1427953 Revert "[LegacyPM] Remove cl::opts controlling optimization pass manager passes"
This reverts commit 7ec05fec7115a910b2e172de794adc462388c25e.

Breaks bots, e.g. https://lab.llvm.org/buildbot#builders/217/builds/15008
2022-11-14 09:33:38 -08:00
Arthur Eubanks
7ec05fec71 [LegacyPM] Remove cl::opts controlling optimization pass manager passes
Move these to the new PM if they're used there.

Part of removing the legacy pass manager for optimization pipeline.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D137915
2022-11-14 09:23:17 -08:00
Arthur Eubanks
4fa328074e [NewPM][Pipeline] Add PipelineTuningOption to set inliner threshold
The legacy PM allowed you to set a custom inliner threshold via
  builder.Inliner = llvm::createFunctionInliningPass(inline_threshold);

This allows the same thing to be done with the new PM optimization pipelines.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D137038
2022-11-02 10:47:51 -07:00
Paul Walker
ab8257ca0e [NFC] Fix a few whitespace inconsistencies. 2022-10-20 14:52:25 +00:00
Pavel Samolysov
1c530500ab [Pipelines] Introduce DAE after ArgumentPromotion
The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting
down generated `alloca` instructions as well as meaningless `store`s and
this behavior can leave unused (dead) arguments. To eliminate the dead
arguments and therefore let the DeadCodeElimination remove becoming dead
inserted `GEP`s as well as `load`s and `cast`s in the callers, the
DeadArgumentElimination pass should be run after the ArgumentPromotion
one.

Differential Revision: https://reviews.llvm.org/D128830
2022-09-22 15:33:46 -07:00
Nuno Lopes
d953d01737 Introduce -enable-global-analyses to allow users to disable inter-procedural analyses
Alive2 doesn't support verification of optimizations that use inter-procedural analyses.
Right now, clang uses GlobalsAA by default and there's no way to disable it.
This leads to Alive2 producing false positives.
The added flag allows us to skip global analyses altogether.

Differential Revision: https://reviews.llvm.org/D134139
2022-09-19 11:59:35 +01:00
Vitaly Buka
181d408186 [pipelines] OptimizerEarlyEPCallbacks for ThinLTO prelink
Similar to OptimizerLastEPCallbacks workaround
added D96320.

Probably NFC as-is, I don't see anything hooked with this callbacks yet,
but I we are looking to move sanitizers.

Reviewed By: aeubanks, MaskRay

Differential Revision: https://reviews.llvm.org/D133333
2022-09-06 15:54:04 -07:00
Arthur Eubanks
9599393eeb Revert "[Pipelines] Introduce DAE after ArgumentPromotion"
This reverts commit b10a341aa5b0b93b9175a8f11efc9a0955ab361e.

This commit exposes the pre-existing https://github.com/llvm/llvm-project/issues/56503 in some edge cases. Will fix that and then reland this.
2022-09-01 08:52:19 -07:00
Pavel Samolysov
b10a341aa5 [Pipelines] Introduce DAE after ArgumentPromotion
The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting
down generated `alloca` instructions as well as meaningless `store`s and
this behavior can leave unused (dead) arguments. To eliminate the dead
arguments and therefore let the DeadCodeElimination remove becoming dead
inserted `GEP`s as well as `load`s and `cast`s in the callers, the
DeadArgumentElimination pass should be run after the ArgumentPromotion
one.

Differential Revision: https://reviews.llvm.org/D128830
2022-08-28 10:47:03 +03:00
Pavel Samolysov
f964417c32 Revert "[Pipelines] Introduce DAE after ArgumentPromotion"
The commit breaks the compiler when a function is used as a function
parameter (hm... for a function from the standard C library?):

```
static float strtof(char *, char *) {}
void a() { strtof(a, 0); }
```

This reverts commit 879f5118fc74657e4a5c4eff6810098e1eed75ac.
2022-08-26 13:43:09 +03:00
Pavel Samolysov
879f5118fc [Pipelines] Introduce DAE after ArgumentPromotion
The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting
down generated `alloca` instructions as well as meaningless `store`s and
this behavior can leave unused (dead) arguments. To eliminate the dead
arguments and therefore let the DeadCodeElimination remove becoming dead
inserted `GEP`s as well as `load`s and `cast`s in the callers, the
DeadArgumentElimination pass should be run after the ArgumentPromotion
one.

Differential Revision: https://reviews.llvm.org/D128830
2022-08-25 10:55:47 +03:00
Pavel Samolysov
6703ad1e0c Revert "[Pipelines] Introduce DAE after ArgumentPromotion"
This reverts commit 3f20dcbf708cb23f79c4866d8285a8ae7bd885de.
2022-08-24 12:44:13 +03:00
Pavel Samolysov
3f20dcbf70 [Pipelines] Introduce DAE after ArgumentPromotion
The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting
down generated `alloca` instructions as well as meaningless `store`s and
this behavior can leave unused (dead) arguments. To eliminate the dead
arguments and therefore let the DeadCodeElimination remove becoming dead
inserted `GEP`s as well as `load`s and `cast`s in the callers, the
DeadArgumentElimination pass should be run after the ArgumentPromotion
one.

Differential Revision: https://reviews.llvm.org/D128830
2022-08-24 10:36:12 +03:00
Ellis Hoag
0f946a50a4 [InstrProf] Add option to disable loop opt after PGO
Add the `-enable-post-pgo-loop-rotation` option to enable or disable the loop rotation transformation [1]. With some instrumentations, e.g., function entry coverage [2], loop rotation is not necessary and can lead to some surprise differences in codegen, even for functions where instrumentation is blocked with `noprofile` or `skipprofile`. The default value is `true` so the default behavior does not change.

[1] https://www.llvm.org/docs/LoopTerminology.html#loop-terminology-loop-rotate
[2] https://reviews.llvm.org/D116180

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D131817
2022-08-17 12:23:18 -07:00
Sanjay Patel
bfb9b8e075 [Passes] add a tail-call-elim pass near the end of the opt pipeline
We call tail-call-elim near the beginning of the pipeline,
but that is too early to annotate calls that get added later.

In the motivating case from issue #47852, the missing 'tail'
on memset leads to sub-optimal codegen.

I experimented with removing the early instance of
tail-call-elim instead of just adding another pass, but that
appears to be slightly worse for compile-time:
+0.15% vs. +0.08% time.
"tailcall" shows adding the pass; "tailcall2" shows moving
the pass to later, then adding the original early pass back
(so 1596886802 is functionally equivalent to 180b0439dc ):
https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&remote=rotateright

Note that there was an effort to split the tail call functionality
into 2 passes - that could help reduce compile-time if we find
that this change costs more in compile-time than expected based
on the preliminary testing:
D60031

Differential Revision: https://reviews.llvm.org/D130374
2022-07-25 15:25:47 -04:00
Alina Sbirlea
846d10f16a Turn on flag to not re-run simplification pipeline.
This patch turns on the flag `-enable-no-rerun-simplification-pipeline`, which means the simplification pipeline will not be rerun on unchanged functions in the CGSCCPass Manager.

Compile time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=17457be1c393ff691cca032b04ea1698fedf0301&to=882301ebb893c8ef9f09fe1ea871f7995426fa07&stat=instructions

No meaningful run time regressions observed in the llvm test suite and
in additional internal workloads at this time.

The example test in `test/Other/no-rerun-function-simplification-pipeline.ll` is a good means to understand the effect of this change:
```
define void @f1(void()* %p) alwaysinline {
  call void %p()
  ret void
}

define void @f2() #0 {
  call void @f1(void()* @f2)
  call void @f3()
  ret void
}

define void @f3() #0 {
  call void @f2()
  ret void
}
```

There are two SCCs formed by the ModuleToPostOrderCGSCCAdaptor: (f1) and (f2, f3).

The pass manager runs on the first SCC, leading to running the simplification pipeline (function and loop passes) on f1. With the flag on, after this, the output will have `Running analysis: ShouldNotRunFunctionPassesAnalysis on f1`.

Next, the pass manager runs on the second SCC: (f2, f3). Since f1() was inlined, f2() now calls itself, and also calls f3(), while f3() only calls f2().
So the pass manager for the SCC first runs the Inliner on (f2, f3), then the simplification pipeline on f2.
With the flag on, the output will have `Running analysis: ShouldNotRunFunctionPassesAnalysis on f2`; unless the inliner makes a change, this analysis remains preserved which means there's no reason to rerun the simplification pipeline. With the flag off, there is a second run of the simplification pipeline run on f2.

Next, the same flow occurs for f3. The simplification pipeline is run on f3 a single time with the flag on, along with `ShouldNotRunFunctionPassesAnalysis on f3`, and twice with the flag off.
The reruns occur only on f2 and f3 due to the additional ref edges.
2022-07-14 06:23:55 -07:00
Kazu Hirata
ec9a0e36d9 [IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC)
The last uses were removed on Apr 15, 2022 in commit
2e6ac54cf48aa04f7b05c382c33135b16d3f01ea.

Differential Revision: https://reviews.llvm.org/D129460
2022-07-11 20:15:24 -07:00