136 Commits

Author SHA1 Message Date
JP Lehr
3ab7ef28ee Revert "[MemProf] Use new option/pass for profile feedback and matching"
This reverts commit b4a82b62258c5f650a1cccf5b179933e6bae4867.

Broke AMDGPU OpenMP Offload buildbot
2023-07-11 05:44:42 -04:00
Teresa Johnson
b4a82b6225 [MemProf] Use new option/pass for profile feedback and matching
Previously the MemProf profile was expected to be in the same profile
file as a normal PGO profile, passed via the usual -fprofile-use=
option, and was matched in the same pass. To simplify profile
preparation, since the raw MemProf profile requires the binary for
symbolization and may be simpler to index separately from the raw PGO
profile, and also to enable providing a MemProf profile for a SamplePGO
build, separate out the MemProf feedback option and matching pass.

This patch adds the -fmemory-profile-use=${file} option, and the
provided file is passed down to LLVM and ultimately used in a new
MemProfUsePass which performs the matching of just the memory profile
contents of that file.

Note that a single profile file containing both normal PGO and MemProf
profile data is still supported, and the relevant profile data is
matched by the appropriate matching pass(es) based on which option(s)
the profile is provided with (the same profile file can be supplied to
both feedback options).

Differential Revision: https://reviews.llvm.org/D154856
2023-07-10 16:42:56 -07:00
David Sherwood
905083f3c1 [LTO] Ensure LICM hoists expensive fdiv instructions introduced by InstCombine
In the LTO pipeline we run InstCombine after LICM, which is
different to what we normally do without LTO. This has the
effect of undoing all the great work done by LICM to reduce
the cost of the loop when it hoists the fdiv out and replaces
it with fmul. When InstCombine runs after LICM it puts the
fdiv straight back which, on AArch64 at least, is darn
expensive. You can observe this problem in the SPEC2017
benchmark parest if you build with "-Ofast -flto" and the
loop-vectoriser uses an unroll factor of 1, which is what
often happens when tail-folding is enabled.

This is also a problem for scalar loops, or indeed any loop
where there is only one use of the preheader fdiv result in
the loop.

See InstCombinerImpl::visitFMul for the code that sinks the fdiv.

I've attempted to fix this by adding another LICM pass for Full
LTO after InstCombine. The alternative is to stop InstCombine
from sinking the fdiv into loops. See D87479 for a previous
discussion on this issue.

Differential Revision: https://reviews.llvm.org/D143631
2023-07-07 11:06:24 +00:00
Matthew Voss
a1ca3af31e [llvm] A Unified LTO Bitcode Frontend
Here's a high level summary of the changes in this patch. For more
information on rational, see the RFC.
(https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774).

  - Add config parameter to LTO backend, specifying which LTO mode is
    desired when using unified LTO.
  - Add unified LTO flag to the summary index for efficiency. Unified
    LTO modules can be detected without parsing the module.
  - Make sure that the ModuleID is generated by incorporating more types
    of symbols.

Differential Revision: https://reviews.llvm.org/D123803
2023-07-05 14:53:14 -07:00
Paul Kirth
75a1797044 Reland [llvm] Preliminary fat-lto-objects support
Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.

Within LLVM, we add a new EmbedBitcodePass that serializes the module to
the object file, and expose a new pass pipeline for compiling fat
objects. The new pipeline initially clones the module and runs the
selected (Thin)LTOPrelink pipeline, after which it will serialize the
module into a `.llvm.lto` section of an ELF file. When compiling for
(Thin)LTO, this normally the point at which the compiler would emit a
object file containing the bitcode and metadata.

After that point we compile the original module using the
PerModuleDefaultPipeline used for non-LTO compilation. We generate
standard object files at the end of this pipeline, which contain machine
code and the new `.llvm.lto` section containing bitcode.

Since the two pipelines operate on different copies of the module, we
can be sure that the bitcode in the `.llvm.lto` section and object code
in  `.text` are congruent with the existing output produced by the
default and LTO pipelines.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Earlier versions of this patch were missing REQUIRES lines for llc
related tests in Transforms/EmbedBitcode. Those tests are now under
CodeGen/X86, which should avoid running the check on unsupported
platforms.

The EmbedbBitcodePass also returned PreservedAnalyses::all when adding a
metadata section, which failed expensive checks, since it modified the
module. This is now corrected.

Reviewed By: tejohnson, MaskRay, nikic

Differential Revision: https://reviews.llvm.org/D146776
2023-06-28 21:37:50 +00:00
Alex Brachet
6085eb3084 Revert "Reland [llvm] Preliminary fat-lto-objects support"
This reverts commit 44265dc3554ef40920b587eeb787a400663af6c7.
2023-06-24 01:15:50 +00:00
Teresa Johnson
200cc952a2 [LTO][GlobalDCE] Use pass parameter instead of module flag for LTO phase
D63932 added a module flag to indicate that we are executing the regular
LTO post merge pipeline, so that GlobalDCE could perform more aggressive
optimization for Dead Virtual Function Elimination. This caused issues
trying to reuse bitcode that had already been through the LTO pipeline
(see context in D139816).

Instead support this by passing down a parameter flag to the
GlobalDCEPass constructor, which is the more usual way for indicating
this information.

Most test changes are to remove incidental uses of this flag. Of the 2
real uses, llvm/test/LTO/ARM/lto-linking-metadata.ll is now obsolete and
removed in this patch, and the virtual-functions-visibility-post-lto.ll
test is updated to use the regular LTO default pipeline where this
parameter is set to true.

Differential Revision: https://reviews.llvm.org/D153655
2023-06-23 17:05:07 -07:00
Paul Kirth
44265dc355 Reland [llvm] Preliminary fat-lto-objects support
Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.

Within LLVM, we add a new EmbedBitcodePass that serializes the module to
the object file, and expose a new pass pipeline for compiling fat
objects. The new pipeline initially clones the module and runs the
selected (Thin)LTOPrelink pipeline, after which it will serialize the
module into a `.llvm.lto` section of an ELF file. When compiling for
(Thin)LTO, this normally the point at which the compiler would emit a
object file containing the bitcode and metadata.

After that point we compile the original module using the
PerModuleDefaultPipeline used for non-LTO compilation. We generate
standard object files at the end of this pipeline, which contain machine
code and the new `.llvm.lto` section containing bitcode.

Since the two pipelines operate on different copies of the module, we
can be sure that the bitcode in the `.llvm.lto` section and object code
in  `.text` are congruent with the existing output produced by the
default and LTO pipelines.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Earlier versions of this patch were missing REQUIRES lines for llc
related tests in Transforms/EmbedBitcode. Those tests are now under
CodeGen/X86, which should avoid running the check on unsupported
platforms.

Reviewed By: tejohnson, MaskRay, nikic

Differential Revision: https://reviews.llvm.org/D146776
2023-06-23 23:23:58 +00:00
Paul Kirth
a3800ad9d8 Revert "[llvm] Preliminary fat-lto-objects support"
There seems to be a problem on arm buildbots. Reverting until I can
investigate.  https://lab.llvm.org/buildbot#builders/245/builds/10184

This reverts commit a67208e1c697649ce432e6497f56a93675273dd8
and dependent commit e54a3112cee5ae0a9117359ecbea878e1388f51e.
2023-06-23 18:43:41 +00:00
Paul Kirth
a67208e1c6 [llvm] Preliminary fat-lto-objects support
Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.

Within LLVM, we add a new EmbedBitcodePass that serializes the module to
the object file, and expose a new pass pipeline for compiling fat
objects. The new pipeline initially clones the module and runs the
selected (Thin)LTOPrelink pipeline, after which it will serialize the
module into a `.llvm.lto` section of an ELF file. When compiling for
(Thin)LTO, this normally the point at which the compiler would emit a
object file containing the bitcode and metadata.

After that point we compile the original module using the
PerModuleDefaultPipeline used for non-LTO compilation. We generate
standard object files at the end of this pipeline, which contain machine
code and the new `.llvm.lto` section containing bitcode.

Since the two pipelines operate on different copies of the module, we
can be sure that the bitcode in the `.llvm.lto` section and object code
in  `.text` are congruent with the existing output produced by the
default and LTO pipelines.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Reviewed By: tejohnson, MaskRay, nikic

Differential Revision: https://reviews.llvm.org/D146776
2023-06-23 17:51:30 +00:00
Teresa Johnson
f354e971b0 [MemProf] Clean up MemProf instrumentation pass invocation
First, removes the invocation of the memprof instrumentation passes from
the end of the module simplification pass builder, where it doesn't
really belong. However, it turns out that this was never being invoked,
as it is guarded by an internal option not used anywhere (even tests).

These passes are actually added via clang under the -fmemory-profile
option. Changed this to add via the EP callback interface, similar to
the sanitizer passes. They are added to the EP for the end of the
optimization pipeline, which is roughly where they were being added
already (end of the pre-LTO link pipelines and non-LTO optimization
pipeline).

Ideally we should plumb the output file through to LLVM and set it up
there, so I have added a TODO.

Differential Revision: https://reviews.llvm.org/D151593
2023-05-26 17:38:49 -07:00
Arthur Eubanks
13e3d4aa5a [Pipeline] Don't run EarlyFPM in LTO post link
EarlyFPM cleans up the output of the frontend. This isn't necessary in post link pipelines as the pre link pipeline already ran this.

~0.4% savings in ThinLTO builds:
https://llvm-compile-time-tracker.com/compare.php?from=8a5d4eb775c644d8683f24817d44c510d2b853b7&to=3580252a2162eadca0da99f1eeaa112f74a0353d&stat=instructions:u

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D145403
2023-05-25 09:32:54 -07:00
Nikita Popov
3060ee0c6a [Pipelines] Don't skip GlobalDCE in ThinLTO pre-link
GlobalDCE will only remove functions with available externally
linkage if they are unreferenced. As such, I don't believe there
is any problem with running this pass as part of the ThinLTO pre-link
pipeline. It will only remove functions that are completely dead in
that module, and I don't think there is any benefit to keeping them
around for the post-link phase.

There is no compile-time impact from the additional pass.

This is a followup to one of the side discussions in D146776.

Differential Revision: https://reviews.llvm.org/D149446
2023-05-15 14:58:24 +02:00
Teresa Johnson
cfad2d3a3d [MemProf] Context disambiguation cloning pass [patch 4/4]
Applies ThinLTO cloning decisions made during the thin link and
recorded in the summary index to the IR during the ThinLTO backend.

Depends on D141077.

Differential Revision: https://reviews.llvm.org/D149117
2023-05-05 16:26:32 -07:00
Shoaib Meenai
141be5c062 Revert "Reland [Pipeline] Don't limit ArgumentPromotion to -O3"
This reverts commit 6f29d1adf29820daae9ea7a01ae2588b67735b9e.

https://reviews.llvm.org/D149768 is causing size regressions for -Oz
with FullLTO, and I'm reverting that one while investigating. This
commit depends on that one, so it needs to be reverted as well.
2023-05-05 14:26:57 -07:00
Arthur Eubanks
6f29d1adf2 Reland [Pipeline] Don't limit ArgumentPromotion to -O3
This is a cheap pass so there's no need to limit to -O3.
This removes some differences between various pipelines.

Code size regressions should be addressed with
https://reviews.llvm.org/D149768.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D148269
2023-05-03 13:17:30 -07:00
Arthur Eubanks
09d27bdb86 Revert "[Pipeline] Don't limit ArgumentPromotion to -O3"
This reverts commit 5b386b864c7619897c51a1da97d78f1cf6f3eff6.

Causes noticeable size increases under -Oz.
2023-05-03 08:57:47 -07:00
serge-sans-paille
afa13ba18d
Reapply Move "auto-init" instructions to the dominator of their users
Original patch (50b2a113db197a97f60ad2aace8b7382dc9b8c31) ignored the
fact that -ftrivial-auto-var-init could affect function parameters with
the sret attribute.
Just do not move instruction that don't affect alloca.
Also add missing test case for volatile instruction.

Differential Revision: https://reviews.llvm.org/D148507
2023-04-24 18:10:10 +02:00
Nikita Popov
e7e4c76320 [Pipelines] Don't run ForceFunctionAttrs post-link
This is effectively a debugging pass to adjust function attributes.
I don't think it makes sense to run it in the post-link pipeline.

Differential Revision: https://reviews.llvm.org/D148904
2023-04-24 09:58:06 +02:00
Nikita Popov
22a408ae51 [Pipelines] Don't explicitly require ORE
LICM does not use ORE from the pass manager, it constructs its
own instance. As such, explicitly requiring the analysis in the
pipeline is unnecessary.
2023-04-21 13:22:04 +02:00
Nikita Popov
384a8dd10e [Pipelines] Don't request BFI in LICM-only loop pass adaptors
LICM doesn't use BFI anymore, so requesting BFI in these loop
pass adaptors is just a waste of compile-time.
2023-04-21 12:55:12 +02:00
Prem Chintalapudi
33817296c6 Expose PassBuilder extension point callbacks
This patch allows access to callbacks registered by TargetMachines to allow custom pipelines to run those callbacks.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D148561
2023-04-17 17:14:40 -07:00
Nikita Popov
73b6b323c5 [Pipelines] Add LoopSink and DivRemPairs to LTO post-link pipeline
As pointed out in D148010, these passes are missing from the LTO
post-link pipeline. They are present in the pre-link pipeline,
but LoopSink is completely useless there (it will always be fully
undone by LICM post-link) and DivRemPairs is mostly useless
(I believe most of what it does will be undone by InstCombine).

I've not added RelLookupTableConverterPass, because it's also
disabled in the LTO pre-link pipeline, with a comment that there
is an unresolved issue with full LTO.

Compile-time impact of the extra passes is minimal. Of course,
LoopSink will have a larger impact in PGO builds.

Differential Revision: https://reviews.llvm.org/D148343
2023-04-17 13:04:26 +02:00
Arthur Eubanks
5b386b864c [Pipeline] Don't limit ArgumentPromotion to -O3
This is a cheap pass so there's no need to limit to -O3.
This removes some differences between various pipelines.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D148269
2023-04-14 10:00:41 -07:00
Arthur Eubanks
4bf9ca5eec [Pipeline] Remove Annotation2Metadata pass in post-link pipelines
The pre-link pipeline already ran the pass and it only needs to be run once.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D145978
2023-04-12 20:34:09 -07:00
Hans Wennborg
a6d9730f40 Revert "Move "auto-init" instructions to the dominator of their users"
This could also move initialization of sret args, causing actually
initialized parts of such return values to be uninitialized. See
discussion on the code review.

> As a result of -ftrivial-auto-var-init, clang generates instructions to
> set alloca'd memory to a given pattern, right after the allocation site.
> In some cases, this (somehow costly) operation could be delayed, leading
> to conditional execution in some cases.
>
> This is not an uncommon situation: it happens ~500 times on the cPython
> code base, and much more on the LLVM codebase. The benefit greatly
> varies on the execution path, but it should not regress on performance.
>
> This is a recommit of cca01008cc31a891d0ec70aff2201b25d05d8f1b with
> MemorySSA update fixes.
>
> Differential Revision: https://reviews.llvm.org/D137707

This reverts commit 50b2a113db197a97f60ad2aace8b7382dc9b8c31
and follow-up commit ad9ad3735c4821ff4651fab7537a75b8f0bb60f8.
2023-04-12 13:37:21 +02:00
Nikita Popov
271853c62f [Pipelines] Move AddDescriminators to consistent position
In the non-ThinLTO pipeline this was directly before PipelineStartEP,
in the ThinLTO pipeline it was directly after. I don't think the
specific position matters here, just make sure it's the same for
both pipelines.
2023-04-11 13:19:06 +02:00
Nikita Popov
f5f04a52b5 [Pipelines] Use isLTOPreLink() helper in more places (NFC) 2023-04-11 13:13:36 +02:00
Nikita Popov
721a914fae [Pipelines] Remove redundant O0 check (NFC)
buildModuleSimplificationPipeline() is not used for O0.
2023-04-11 13:13:36 +02:00
Dávid Bolvanský
05a2f4290e [AggressiveInstCombine] Enable also for -O2
Next step after https://reviews.llvm.org/D113179

Recently a set of patches by @anton-afanasyev improved many cases (better and cleaner vectorized code) thanks to improvements to AIC's TruncInstCombine (IC cannot handle it) motivated by real examples in bug reports. There was a discussion that -O2 could benefit from AIC as well, but discussion then stalled, so I would like restart it, with new numbers from LLVM compile time tracker.

As -O2 pipeline is not tracked by LLVM compile time tracker, I disabled AIC for -O3 to get an idea how expensive is it. Without AIC, I observed that geomean was cca -0.10%. Given that it seems like AIC is quite cheap, heavily tested by -O3 pipeline, I am proposing to enable it also with -O2 and similar to improve quality to vectorized code.

https://llvm-compile-time-tracker.com/compare.php?from=a1df5abef5f27646c809c7b85cf6170eb68f7735&to=e1ba6068f58c6ca862b920b8750faccb42a5843c&stat=instructions:u

Differential Revision: https://reviews.llvm.org/D147604
Reviewed-By: nikic
2023-04-05 16:51:21 +02:00
serge-sans-paille
50b2a113db
Move "auto-init" instructions to the dominator of their users
As a result of -ftrivial-auto-var-init, clang generates instructions to
set alloca'd memory to a given pattern, right after the allocation site.
In some cases, this (somehow costly) operation could be delayed, leading
to conditional execution in some cases.

This is not an uncommon situation: it happens ~500 times on the cPython
code base, and much more on the LLVM codebase. The benefit greatly
varies on the execution path, but it should not regress on performance.

This is a recommit of cca01008cc31a891d0ec70aff2201b25d05d8f1b with
MemorySSA update fixes.

Differential Revision: https://reviews.llvm.org/D137707
2023-04-04 07:30:03 +02:00
serge-sans-paille
11ae47dfc6
Revert "Move "auto-init" instructions to the dominator of their users"
This reverts commit cca01008cc31a891d0ec70aff2201b25d05d8f1b.

This change breaks memory ssa checks, see https://lab.llvm.org/buildbot#builders/109/builds/60970
2023-04-03 15:46:18 +02:00
serge-sans-paille
cca01008cc
Move "auto-init" instructions to the dominator of their users
As a result of -ftrivial-auto-var-init, clang generates instructions to
set alloca'd memory to a given pattern, right after the allocation site.
In some cases, this (somehow costly) operation could be delayed, leading
to conditional execution in some cases.

This is not an uncommon situation: it happens ~500 times on the cPython
code base, and much more on the LLVM codebase. The benefit greatly
varies on the execution path, but it should not regress on performance.

Differential Revision: https://reviews.llvm.org/D137707
2023-04-03 15:27:27 +02:00
ibricchi
1a36eaa552 [Pass Builder] Allow Module Inliner for full LTO
Currently there is no way to enable the module inliner when
linking with full lto. This patch enables that option.

Differential Revision: https://reviews.llvm.org/D146805
2023-04-03 14:16:35 +02:00
Teresa Johnson
700cd99061 Restore "[MemProf] Context disambiguation cloning pass [patch 1a/3]"
This restores commit d6ad4f01c3dafcab335bca66dac6e36d9eac8421, which was
reverted in commit 883dbb9c86be87593a58ef10b070b3a0564c7fee, along with
a fix for gcc 12.2 build errors in the original commit.

Support for building, printing, and displaying CallsiteContextGraph
which represents the MemProf metadata contexts. Uses CRTP to enable
support for both IR (regular LTO) and summary (ThinLTO). This patch
includes the support for building it in regular LTO mode (from
memprof and callsite metadata), and the next patch will add the
handling for building it from ThinLTO summaries.

Also includes support for dumping the graph to text and to dot files.

Follow-on patches will contain the support for cloning on the graph and
in the IR.

The graph represents the call contexts in all memprof metadata on
allocation calls, with nodes for the allocations themselves, as well as
for the calls in each context. The graph is initially built from the
allocation memprof metadata (or summary) MIBs. It is then updated to
match calls with callsite metadata onto the nodes, updating it to
reflect any inlining performed on those calls.

Each MIB (representing an allocation's call context with allocation
behavior) is assigned a unique context id during the graph build. The
edges and nodes in the graph are decorated with the context ids they
carry. This is used to correctly update the graph when cloning is
performed so that we can uniquify the context for a single (possibly
cloned) allocation.

Differential Revision: https://reviews.llvm.org/D140908
2023-03-22 10:16:06 -07:00
Nikita Popov
883dbb9c86 Revert "[MemProf] Context disambiguation cloning pass [patch 1a/3]"
This reverts commit d6ad4f01c3dafcab335bca66dac6e36d9eac8421.

Fails to build on at least gcc 12.2:

/home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:482:1: error: no declaration matches ‘ContextNode<DerivedCCG, FuncTy, CallTy>* CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(const CallInfo&)’
  482 | CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:393:16: note: candidate is: ‘CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode* CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(const CallInfo&)’
  393 |   ContextNode *getNodeForInst(const CallInfo &C);
      |                ^~~~~~~~~~~~~~
/home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:99:7: note: ‘class CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>’ defined here
   99 | class CallsiteContextGraph {
      |       ^~~~~~~~~~~~~~~~~~~~
2023-03-22 15:43:46 +01:00
Teresa Johnson
d6ad4f01c3 [MemProf] Context disambiguation cloning pass [patch 1a/3]
Support for building, printing, and displaying CallsiteContextGraph
which represents the MemProf metadata contexts. Uses CRTP to enable
support for both IR (regular LTO) and summary (ThinLTO). This patch
includes the support for building it in regular LTO mode (from
memprof and callsite metadata), and the next patch will add the
handling for building it from ThinLTO summaries.

Also includes support for dumping the graph to text and to dot files.

Follow-on patches will contain the support for cloning on the graph and
in the IR.

The graph represents the call contexts in all memprof metadata on
allocation calls, with nodes for the allocations themselves, as well as
for the calls in each context. The graph is initially built from the
allocation memprof metadata (or summary) MIBs. It is then updated to
match calls with callsite metadata onto the nodes, updating it to
reflect any inlining performed on those calls.

Each MIB (representing an allocation's call context with allocation
behavior) is assigned a unique context id during the graph build. The
edges and nodes in the graph are decorated with the context ids they
carry. This is used to correctly update the graph when cloning is
performed so that we can uniquify the context for a single (possibly
cloned) allocation.

Depends on D140786.

Differential Revision: https://reviews.llvm.org/D140908
2023-03-22 07:05:27 -07:00
Nikita Popov
a8f6b5763e [PassBuilder] Support O0 in default pipelines
The default and pre-link pipeline builders currently require you to
call a separate method for optimization level O0, even though they
have perfectly well-defined O0 optimization pipelines.

Accept O0 optimization level and call buildO0DefaultPipeline()
internally, so all consumers don't need to repeat this.

Differential Revision: https://reviews.llvm.org/D146200
2023-03-17 10:00:05 +01:00
Arthur Eubanks
20ed9cebb6 [Pipeline] Remove early InstCombine in ThinLTO post link sample profile pipeline
With opaque pointers, all function pointer types are the same, meaning there should be no bitcasts.

Internal benchmarks with SampleFDO look neutral.

This was added in D36333.

Reviewed By: tejohnson, davidxl

Differential Revision: https://reviews.llvm.org/D146099
2023-03-14 19:48:31 -07:00
Alexandros Lamprineas
f242291f59 [FuncSpec] Do not run pre-link when doing LTO.
Saves time. Post link will cover most cases anyway.

Differential Revision: https://reviews.llvm.org/D145394
2023-03-14 18:56:26 +00:00
Arthur Eubanks
87dadf0f5b [Pipeline] Move some GlobalOpt/GlobalDCE runs into simplification pipeline
These are very clearly more simplification than optimization.

Mostly NFC, except for some ordering around passes that don't really matter.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D145967
2023-03-14 09:01:14 -07:00
Nikita Popov
fb5683449e [Pipelines] Restore old DAE position in LTO pipeline
This is a partial revert of D128830, restoring the previous
position of DeadArgElim in the fat LTO pipeline. The motivation
for this is a major code size regression observed in Rust and
illustrated in the PhaseOrdering test.

This is a conservative fix restoring the previous pipeline order.
The real problem is that the LTO pipeline is conceptually broken:
It doesn't have a CGSCC function simplification pipeline. The
inliner is just being run by itself. This wouldn't be a problem
if fat LTO used a standard design where ArgPromotion and DAE are
only run after functions have already been simplified by the
CGSCC inliner pipeline.

Differential Revision: https://reviews.llvm.org/D146051
2023-03-14 17:00:17 +01:00
Sanjay Patel
ef6f23535d Revert "[InstCombine] use loop info when running the pass after loop vectorization"
This reverts commit 43ae4b62b2671cf73e691c0b53324cd39405cd51.

This was intended to be practically NFC in terms of the overall
opt pipeline, but there is experimental data showing that code
changes occurred here:
https://llvm-compile-time-tracker.com/compare.php?from=772aa05452f8ff90a47168e6801cda2acb5a1873&to=43ae4b62b2671cf73e691c0b53324cd39405cd51&stat=size-text
2023-03-11 17:28:56 -05:00
Sanjay Patel
43ae4b62b2 [InstCombine] use loop info when running the pass after loop vectorization
This is the follow-up to D144199 and suggestion from D144045.
We make use of loop info explicit via InstCombine pass parameter
rather than semi-arbitrary via caching.

The only InstCombine transform that uses LoopInfo currently is a
GEP fold in visitGEPOfGEP(), so that shows up as a failure in the
dedicated test for the fold as well as several LoopVectorizer tests
that run extra passes.

I don't see any pass manager regression tests that actually check
for pass options, but this is intended to be NFC for the pass
pipeline behavior - we only try to use loop info where it would
have been used before via caching .

Differential Revision: https://reviews.llvm.org/D144274
2023-03-11 14:20:30 -05:00
Arthur Eubanks
0d4a709bb8 [Pipeline] Adjust PostOrderFunctionAttrs placement in simplification pipeline
We can infer more attribute information once functions are fully
simplified, so move the PostOrderFunctionAttrs pass after the function
simplification pipeline. However, just doing this can impact
simplification of recursive functions since function simplification
takes advantage of function attributes of callees (some LLVM tests are
actually impacted by this), so keep a copy of PostOrderFunctionAttrs
before the function simplification pipeline that only runs on recursive
functions.

For example, this fixes the small regression noticed in https://reviews.llvm.org/D128830.

This requires some restructuring of the CGSCC NoRerun feature. We need
to cache the ShouldNotRunFunctionPassesAnalysis analysis after the
simplification is done, which now is after the second
PostOrderFunctionAttrs run, rather than after the function
simplification pipeline.

Compile time impact:
https://llvm-compile-time-tracker.com/compare.php?from=33cf40122279342b50f92a3a53f5c185390b6018&to=1bb2a07875634e508a6bdf2ca1b130f55510f060&stat=instructions:u

Compile time increase from unconditionally running the first PostOrderFunctionAttrs:
https://llvm-compile-time-tracker.com/compare.php?from=1bb2a07875634e508a6bdf2ca1b130f55510f060&to=f4f87e89cc7a35c64e3a103a8036192a84ae002b&stat=instructions:u

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D145210
2023-03-06 09:01:45 -08:00
Arthur Eubanks
bd6eb1423c [NFC][Pipeline] Move PromotePass into GlobalCleanupPM 2023-03-01 13:22:24 -08:00
Arthur Eubanks
25af6507e7 [PassBuilder] Always enable CountVisitsPass when stats are enabled
Rather than having a separate flag.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D145015
2023-03-01 09:22:02 -08:00
Rong Xu
666731660c [Pass][CHR] Move ControlHeightReduction to module optimization pipeline
This is a modified version of commit b374423304a8 by
Arthur (https://reviews.llvm.org/D143424).

Here we invoke to the pass independent of PGOOPT. We now check if the
profile is available through the program summary. This ensures CHR is
called in distributed ThinLTO BE compilation (where PGOOPT might not
be created).

Differential Revision: https://reviews.llvm.org/D144769
2023-02-27 11:47:54 -08:00
Arthur Eubanks
a628ca4925 Revert "[Pipeline] Move ControlHeightReduction to module optimization pipeline"
This reverts commit b374423304a8d91d590d0ce5ab1b381296d6dfb2.

Causes regressions on some benchmarks.
2023-02-23 10:17:12 -08:00
Arthur Eubanks
b374423304 [Pipeline] Move ControlHeightReduction to module optimization pipeline
This pass isn't a simplification, it's a non-canonical optimization.

This makes it only run once in a (Thin)LTO pipeline during postlink, just like all the other optimization pipeline passes.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D143424
2023-02-16 15:23:38 -08:00