189 Commits

Author SHA1 Message Date
Mircea Trofin
4a2bf05980 Reapply "[ctx_prof] Fix the pre-thinlink "use" case (#102511)"
This reverts commit 967185eeb85abb77bd6b6cdd2b026d5c54b7d4f3.

The problem was link dependencies, moved `UseCtxProfile` to `Analysis`.
2024-08-08 17:04:00 -07:00
Aiden Grossman
967185eeb8 Revert "[ctx_prof] Fix the pre-thinlink "use" case (#102511)"
This reverts commit 1a6d60e0162b3ef767c87c95512dd453bf4f4746.

Broke some buildbots.
2024-08-08 21:14:56 +00:00
Mircea Trofin
1a6d60e016
[ctx_prof] Fix the pre-thinlink "use" case (#102511)
Didn't notice in #101338 that the instrumentation in `llvm/test/Transforms/PGOProfile/ctx-prof-use-prelink.ll` was actually incorrect.
2024-08-08 16:45:04 -04:00
Mircea Trofin
dbbf0762b6
[ctx_prof] CtxProfAnalysis (#102084)
This is an immutable analysis that loads and makes the contextual profile available to other passes. This patch introduces the analysis and an analysis printer pass. Subsequent patches will introduce the APIs that IPO passes will call to modify the profile as result of their changes.
2024-08-07 14:39:48 -04:00
Mircea Trofin
ba4da5a087
[ctx_prof] "Use" support for pre-thinlink. (#101338)
There is currently no plan to support contextual profiling use in a non-
ThinLTO scenario.

In the pre-link phase, we only instrument and then immediately bail out
to let the linker group functions under an entrypoint in the same module
as the entrypoint. We don't actually care what the profile contains -
just that we want to use a contextual profile.

After that, in post-thinlink, we require the profile be passed again so
we can actually use it. The earlier instrumentation will be used to
match counter values.

While the feature is in development, we add a hidden flag for the use
scenario, but we can eventually tie it to the `PGOOptions` mechanism. We
will use the same flag in both pre- and post-thinlink, because it
simplifies things - usually the post-thinlink args are the same as the
ones for pre-. This, despite the flag being basically treated as a
boolean in pre-thinlink.
2024-08-02 20:51:27 -04:00
Wei Wang
3a9ef4e69a
[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#100205)
This is re-land of #90310 after making asan skip pre-split coroutines in
#99415.

Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that
CoroElide can happen after callee coroutine is imported into caller's
module in ThinLTO.
2024-07-29 17:42:01 -07:00
Joseph Huber
8758091a70
[LLVM] Add 'ExpandVariadicsPass' to LTO default pipeline (#100479)
Summary:
This pass expands variadic functions into non-variadic function calls
according to the target ABI. Currently, this is used as the lowering for
the NVPTX and AMDGPU targets.

This pass is currently only run late in the target's backend. However,
during LTO we want to run it before the inliner pass so that the
expanded functions can be inlined using standard heuristics. This pass
is a no-op for unsupported targets, so this won't apply to any code that
isn't already using it.
2024-07-25 09:21:05 -05:00
Tianqing Wang
3d494bfc7f
[SimplifyCFG] Increase budget for FoldTwoEntryPHINode() if the branch is unpredictable. (#98495)
The `!unpredictable` metadata has been present for a long time, but
it's usage in optimizations is still limited. This patch teaches
`FoldTwoEntryPHINode()` to be more aggressive with an unpredictable
branch to reduce mispredictions.

A TTI interface `getBranchMispredictPenalty()` is added to distinguish
between different hardwares to ensure we don't go too far for simpler
cores. For simplicity, only a naive x86 implementation is included for
the time being.
2024-07-23 07:47:21 +08:00
xur-llvm
b1ca2a9546
[PGO] Sampled instrumentation in PGO to speed up instrumentation binary (#69535)
In comparison to non-instrumented binaries, PGO instrumentation binaries
can be significantly slower. For highly threaded programs, this slowdown
can
reach 10x due to data races or false sharing within counters.

This patch incorporates sampling into the PGO instrumentation process to
enhance the speed of instrumentation binaries. The fundamental concept
is similar to the one proposed in https://reviews.llvm.org/D63949.

Three sampling modes are introduced:
1. Simple Sampling: When '-sampled-instr-bust-duration' is set to 1.
2. Fast Burst Sampling: When not using simple sampling, and 
'-sampled-instr-period' is set to 65535. This is the default mode of
sampling.
3. Full Burst Sampling: When neither simple nor fast burst sampling is
used.

Utilizing this sampled instrumentation significantly improves the
binary's
execution speed. Measurements show up to 5x speedup with default
settings. Fast burst sampling now results in only around 20% to 30%
slowdown (compared to 8 to 10x slowdown without sampling).

Out tests show that profile quality remains good with sampling,
with edge counts typically showing more than 90% overlap.
For applications whose behavior changes due to binary speed,
sampling instrumentation can enhance performance.
Observations have shown some apps experiencing up to
a ~2% improvement in PGO.

A potential drawback of this patch is the increased binary size
and compilation time. The Sampling method in this patch does
not improve single threaded program instrumentation binary
speed.
2024-07-22 09:19:17 -07:00
YAMAMOTO Takashi
5d79110959
[Pipelines] Perform mergefunc after constmerge (#92498)
Constmerge can fold switch jump tables, possibly making functions
identical again. It can help mergefunc.
On the other hand, the opposite seems unlikely.

Fixes https://github.com/llvm/llvm-project/issues/92201.
2024-07-05 12:28:03 +02:00
Egor Pasko
cab81dd038
[EntryExitInstrumenter] Move passes out of clang into LLVM default pipelines (#92171)
Move EntryExitInstrumenter(PostInlining=true) to as late as possible and
EntryExitInstrumenter(PostInlining=false) to an early pre-inlining stage
(but skip for ThinLTO post-link).

This should fix the issues reported in
https://github.com/rust-lang/rust/issues/92109 and
https://github.com/llvm/llvm-project/issues/52853. These are caused
by https://reviews.llvm.org/D97608.
2024-05-31 12:48:45 -07:00
Mircea Trofin
d311a62e2f
[ctx_profile] Decouple ctx instrumentation from PGOOpt (#92445)
We currently don't support passing files and don't need frontend involvement either.
2024-05-16 13:41:36 -07:00
Mircea Trofin
174cdeced0
[nfc] Clarify when the various PGO instrumentation passes run (#92330)
The code seems easier to read if it's centered on what the user wants rather than combinations of whatever internal variables.
2024-05-16 12:17:22 -07:00
Reid Kleckner
aa0776de46 Revert "[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#90310)" and related patches
This change is incorrect when thinlto and asan are enabled, and this can
be observed by adding `-fsanitize=address` to the provided
coro-elide-thinlto.cpp test. It results in the error "Coroutines cannot
handle non static allocas yet", and ASan introduces a dynamic alloca.

In other words, we must preserve the invariant that CoroSplit runs
before ASan. If we move CoroSplit to the post post-link compile stage,
ASan has to be moved to the post-link compile stage first.  It would
also be correct to make CoroSplit handle dynamic allocas so the pass
ordering doesn't matter, but sanitizer instrumentation really ought to
be last, after coroutine splitting.

This reverts commit bafc5f42c0132171287d7cba7f5c14459be1f7b7.
This reverts commit b1b1bfa7bea0ce489b5ea9134e17a43c695df5ec.
This reverts commit 0232b77e145577ab78e3ed1fdbb7eacc5a7381ab.
This reverts commit fb2d3056618e3d03ba9a695627c7b002458e59f0.
This reverts commit 1cb33713910501c6352d0eb2a15b7a15e6e18695.
This reverts commit cd68d7b3c0ebf6da5e235cfabd5e6381737eb7fe.
2024-05-10 21:28:13 +00:00
Mircea Trofin
96568f3539
[llvm][ctx_profile] Add instrumentation lowering (#90821)
This adds the instrumentation lowering pass.

(Tracking Issue: #89287, RFC referenced there)
2024-05-08 16:49:08 -07:00
Wei Wang
bafc5f42c0
[Pipelines][Coroutines] Tune coroutine passes only for ThinLTO pre-link pipeline (#90690)
Follow up to #90310, limit the tune up only to ThinLTO pre-link as
coroutine passes are not in MonoLTO backend
2024-04-30 21:40:04 -07:00
Wei Wang
cd68d7b3c0
[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#90310)
Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that
CoroElide can happen after callee coroutine is imported into caller's
module in ThinLTO.
2024-04-29 10:24:53 -07:00
Arthur Eubanks
947b656add
[PGO] Check that PGOOpt exists before using PGOOpt->ColdOptType (#89139)
This means that the pass is unusable without some sort of profile. We
can revisit this decision later if we want to support running this pass
without a profile.
2024-04-18 11:22:10 -07:00
Florian Hahn
0f82469314
[Passes] Run SimpleLoopUnswitch after introducing invariant branches. (#81271)
IndVars may be able to replace a loop dependent condition with a loop
invariant one, but loop-unswitch runs before IndVars, so the invariant
check remains in the loop.

For an example, consider a read-only loop with a bounds check:
https://godbolt.org/z/8cdj4qhbG

This patch uses a approach similar to the way extra cleanup passes are
run on demand after vectorization (added in acea6e9cfa4c4a0e8678c7).

It introduces a new ShouldRunExtraSimpleLoopUnswitch analysis marker,
which IndVars can use to indicate that extra unswitching is beneficial.

ExtraSimpleLoopUnswitchPassManager uses this analysis to determine
whether to run its passes on a loop.

Compile-time impact (geomean) ranges from +0.0% to 0.02%
https://llvm-compile-time-tracker.com/compare.php?from=138c0beb109ffe47f75a0fe8c4dc2cdabe8a6532&to=19e6e99eeb280d426907ea73a21b139ba7225627&stat=instructions%3Au

Compile-time impact (geomean) of unconditionally running
SimpleLoopUnswitch ranges from +0.05% - +0.16%

https://llvm-compile-time-tracker.com/compare.php?from=138c0beb109ffe47f75a0fe8c4dc2cdabe8a6532&to=2930dfd5accdce2e6f8d5146ae4d626add2065a2&stat=instructions:u

Unconditionally running SimpleLoopUnswitch seems to indicate that there
are multiple other scenarios where we fail to run unswitching when
opportunities remain.


Fixes https://github.com/llvm/llvm-project/issues/85551.

PR: https://github.com/llvm/llvm-project/pull/81271
2024-04-12 22:07:29 +01:00
lifengxiang1025
e40cabfea4
[MemProf] Match function's summary and definition strictly (#83665)
Problem description:
https://github.com/llvm/llvm-project/pull/81008#issuecomment-1933468520
Solution:
https://github.com/llvm/llvm-project/pull/81008#issuecomment-1934192548
(choose plan2)
2024-03-12 11:00:02 +08:00
Paul Kirth
2fef685363
[llvm][loop-rotate] Allow forcing loop-rotation (#82828)
Many profitable optimizations cannot be performed at -Oz, due to
unrotated loops. While this is worse for size (minimally), many of the
optimizations significantly reduce code size, such as memcpy
optimizations and other patterns found by loop idiom recognition.
Related discussion can be found in issue #50308.

This patch adds an experimental, backend-only flag to allow loop header
duplication, regardless of the optimization level. Downstream consumers
can experiment with this flag, and if it is profitable, we can adjust
the compiler's defaults accordingly, and expose any useful frontend
flags to opt into the new behavior.
2024-02-29 13:46:13 -08:00
Paul Kirth
777ac46ddb
[llvm] Remove pipeline checks for optsize for DFAJumpThreadingPass
The pass itself checks whether to apply the optimization based on the
minsize attribute, so there isn't much functional benefit to preventing
the pass from being added. Gating the pass gets added to the pass
pipeline complicates the interaction with -enable-dfa-jump-thread, as
well.

Reviewers: aeubanks

Reviewed By: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/83318
2024-02-28 11:12:13 -08:00
David Spickett
9c5ca6b0ce Revert "Enable JumpTableToSwitch pass by default (#82546)"
This reverts commit 1069823ce7d154aa8ef87ae5a0fd34b527eca2a0.

This has caused second stage timeouts when building Flang on
AArch64:
https://lab.llvm.org/buildbot/#/builders/179/builds/9442
2024-02-26 13:35:59 +00:00
Alexander Shaposhnikov
1069823ce7
Enable JumpTableToSwitch pass by default (#82546)
Enable JumpTableToSwitch pass by default.

Test plan: ninja check-all
2024-02-22 11:02:47 -08:00
Arthur Eubanks
93cdd1b5cf
[PGO] Add ability to mark cold functions as optsize/minsize/optnone (#69030)
The performance of cold functions shouldn't matter too much, so if we
care about binary sizes, add an option to mark cold functions as
optsize/minsize for binary size, or optnone for compile times [1]. Clang
patch will be in a future patch.

This is intended to replace `shouldOptimizeForSize(Function&, ...)`.
We've seen multiple cases where calls to this expensive function, if not
careful, can blow up compile times. I will clean up users of that
function in a followup patch.

Initial version: https://reviews.llvm.org/D149800

[1]
https://discourse.llvm.org/t/rfc-new-feature-proposal-de-optimizing-cold-functions-using-pgo-info/56388
2024-02-12 14:52:08 -08:00
Alexander Shaposhnikov
d26b43ff4f
Add JumpTableToSwitch pass (#77709)
Add a pass to convert jump tables to switches.
The new pass replaces an indirect call with a switch + direct calls if all the functions in the jump table are smaller than the provided threshold.
The pass is currently disabled by default and can be enabled by -enable-jump-table-to-switch.

Test plan: ninja check-all
2024-02-10 01:12:46 -08:00
Paul Kirth
9d476e1e1a
[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061)
Currently, the UnifiedLTO pipeline seems to have trouble with several
LTO features, like SplitLTO units, which means we cannot use important
optimizations like Whole Program Devirtualization or security hardening
instrumentation like CFI.

This patch reverts FatLTO to using distinct pipelines for Full LTO and
ThinLTO. It still avoids module cloning, since that was error prone.
2024-01-23 14:04:52 -08:00
Mingming Liu
5ce286849a
[CGProfile] Use callee's PGO name when caller->callee is an indirect call. (#78610)
- With PGO, indirect call edges are constructed using value profiles, and the profile address is mapped to a function's PGO name. The PGO name is computed using a functions linkage before LTO internalization or global promotion.
- With ThinLTO, local functions [could be
promoted](2663d2cb9c/llvm/lib/Transforms/Utils/FunctionImportUtils.cpp (L288)) to have external linkage; and with
[full](2663d2cb9c/llvm/lib/LTO/LTO.cpp (L1328))
or
[thin](2663d2cb9c/llvm/lib/LTO/LTO.cpp (L448))
LTO, global functions could be internalized. Edge construction should use a function's PGO name before its linkage is updated.
2024-01-22 10:36:03 -08:00
Mircea Trofin
1d608fc755
[NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970)
Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`.

A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.
2023-12-10 18:03:08 -08:00
Paul Kirth
cfe1ece833
[clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180)
https://github.com/llvm/llvm-project/issues/70703 pointed out that
cloning LLVM modules could lead to miscompiles when using FatLTO.

This is due to an existing issue when cloning modules with labels (see
#55991 and #47769). Since this can lead to miscompilation, we can avoid
cloning the LLVM modules, which was desirable anyway.

This patch modifies the EmbedBitcodePass to no longer clone the module
or run an input pipeline over it. Further, it make FatLTO always perform
UnifiedLTO, so we can still defer the Thin/Full LTO decision to
link-time. Lastly, it removes dead/obsolete code related to now defunct
options that do not work with the EmbedBitcodePass implementation any
longer.
2023-11-30 17:09:34 -08:00
Tom Stellard
2750a22745
Passes: Consolidate EnableKnowledgeRetention declarations into a header file (#71695) 2023-11-13 11:03:49 -08:00
dewen
3b82336188
Revert "[PM] Execute IndVarSimplifyPass precede RessociatePass" (#71617)
Reverts llvm/llvm-project#71054
2023-11-08 09:22:55 +08:00
dewen
e4d27d7f32
[PM] Execute IndVarSimplifyPass precede RessociatePass (#71054)
ReassociatePass may clear nsw/nuw flags of some instructions, which may
have side effects on optimizations in IndVarSimplifyPass.
2023-11-08 09:21:17 +08:00
Teresa Johnson
87f5e22987
[MemProf] Tolerate missing leaf debug frames (#71233)
Loosen up the matching so that a missing leaf debug frame in the profile
does not prevent matching an allocation context if we can match further
up the inlined call context. This relies on the pre-inliner, which was
already the default when performing normal PGO feedback along with the
MemProf feedback, but to ensure matching is not affected by the presence
of PGO, enable the pre-inliner for MemProf feedback as well.
2023-11-03 21:01:07 -07:00
Nikita Popov
a682a9cfd0 Revert "Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235)"
This reverts commit 19b5495b653a00da7a250f48b4f739fcf2bbe82f.

PR landed without approval, with severe quality issues.
2023-11-03 21:15:46 +01:00
Manman Ren
19b5495b65
Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235)
See RFC for details:
https://discourse.llvm.org/t/rfc-for-moving-swift-s-merge-function-pass-to-llvm/73778

We will need to refactor extension to FunctionComparator/FunctionHash to
StructuralHash. This patch adds a new pass which is ported from Swift,
and will need to discuss on how to migrate Swift’s pass over after we
land this in llvm.

Create this PR to get some early review on the patch.

---------

Co-authored-by: Manman Ren <mren@meta.com>
2023-11-03 11:13:58 -07:00
Amara Emerson
1a2e77cf9e Revert "Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner.""
This reverts commit 86bfeb906e3a95ae428f3e97d78d3d22a7c839f3.

This is a long time coming re-application that was originally reverted due to
regressions, unrelated to the actual inlining change. These regressions have since
been fixed due to another long-in-the-making change of a66051c6 landing.

Original commit message for reference:
---
    We have several situations where it's beneficial for code size to ensure that every
    call to always-inline functions are inlined before normal inlining decisions are
    made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this,
    it only does it on a per-SCC basis, rather than the whole module. Ensuring that
    all mandatory inlinings are done before any heuristic based decisions are made
    just makes sense.

    Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary
    for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0.

    This also fixes an exponential compile time blow up in
    https://github.com/llvm/llvm-project/issues/59126

    Differential Revision: https://reviews.llvm.org/D143624
---
2023-10-28 23:21:11 -07:00
Alex Voicu
0ce6255a50 [HIP][LLVM][Opt] Add LLVM support for hipstdpar
This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional:

1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform:

    - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail);
    - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them;
    - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots;
    - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module;

2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform:
    - Iterate all functions in a Module;
    - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available;
        - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time;
    - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter.

Reviewed by: JonChesterfield, yaxunl

Differential Revision: https://reviews.llvm.org/D155856
2023-10-12 11:26:48 +01:00
Alex Voicu
25935c384d Revert "[HIP][LLVM][Opt] Add LLVM support for hipstdpar"
This reverts commit c5bba7ea5a05f540948f76a189c880eb24a5e8c6.
2023-10-11 12:27:03 +01:00
Alex Voicu
c5bba7ea5a [HIP][LLVM][Opt] Add LLVM support for hipstdpar
This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional:

1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform:

    - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail);
    - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them;
    - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots;
    - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module;

2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform:
    - Iterate all functions in a Module;
    - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available;
        - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time;
    - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter.

Reviewed by: JonChesterfield, yaxunl

Differential Revision: https://reviews.llvm.org/D155856
2023-10-11 12:22:00 +01:00
Fangrui Song
2d854dd3e7 Move global namespace cl::opt inside llvm:: or internalize them 2023-10-10 19:58:03 -07:00
Alex Voicu
98eda5dda7 Revert "[HIP][LLVM][Opt] Add LLVM support for hipstdpar" in order to address build breakage.
This reverts commit 9b98ebb0eb43b005921926a622177f10e13b1ac6.
2023-10-10 12:16:10 +01:00
Alex Voicu
9b98ebb0eb [HIP][LLVM][Opt] Add LLVM support for hipstdpar
This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional:

1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform:

    - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail);
    - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them;
    - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots;
    - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module;

2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform:
    - Iterate all functions in a Module;
    - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available;
        - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time;
    - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter.

Reviewed by: JonChesterfield, yaxunl

Differential Revision: https://reviews.llvm.org/D155856
2023-10-10 12:02:05 +01:00
lcvon007
f3c417f341
[Passes] Add option for LoopVersioningLICM pass. (#67107)
User only can use opt to test LoopVersioningLICM pass, and this PR add
the option back(deleted in https://reviews.llvm.org/D137915) so that
it's easy for verifying if it is useful for some benchmarks.
2023-09-27 07:38:37 -05:00
Florian Hahn
04f9a8a7d6
[ConstraintElim] Move just before loop simplification pipeline.
Adjust the pipeline slightly to move ConstraintElim just before the loop
simplification pipeline. This increases the number of cases where SCEV
should can preserved in the future.

This also enables slightly more opportunities, by benefiting from
earlier CFG simplifications, which allow more conditions to be added.

Reviewed By: nikic, antoniofrighetto

Differential Revision: https://reviews.llvm.org/D158843
2023-09-22 14:31:08 +01:00
Dhruv Chawla
515a826326
[NFC][InferAlignment] Swap extern declaration and definition of EnableInferAlignmentPass
This prevents a linker issue when only InstCombine is linked without
PassBuilder, like in the case of bugpoint.
2023-09-20 13:07:13 +05:30
Dhruv Chawla
3e992d81af
[InferAlignment] Enable InferAlignment pass by default
This gives an improvement of 0.6%:
https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u

Differential Revision: https://reviews.llvm.org/D158600
2023-09-20 12:08:52 +05:30
Dhruv Chawla
0f152a55d3
[InferAlignment] Implement InferAlignmentPass
This pass aims to infer alignment for instructions as a separate pass,
to reduce redundant work done by InstCombine running multiple times. It
runs late in the pipeline, just before the back-end passes where this
information is most useful.

Differential Revision: https://reviews.llvm.org/D158529
2023-09-20 12:03:36 +05:30
Nuno Lopes
281ae4903d [Pipelines] Guard a few more usages of GlobalsAA under the EnableGlobalAnalyses flag 2023-09-07 13:58:28 +01:00
Qiongsi Wu
611ce24114 [PGO] Enable -fprofile-update for -fprofile-generate
Currently, the `-fprofile-udpate` is ignored when `-fprofile-generate` is in effect. This patch enables `-fprofile-update` for `-fprofile-generate`. This patch continues the work from https://reviews.llvm.org/D87737, which added `-fprofile-update` in the first place.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D157280
2023-08-15 10:10:03 -04:00