llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-26 13:26:08 +00:00

Author	SHA1	Message	Date
Mircea Trofin	4a2bf05980	Reapply "[ctx_prof] Fix the pre-thinlink "use" case (#102511 )" This reverts commit 967185eeb85abb77bd6b6cdd2b026d5c54b7d4f3. The problem was link dependencies, moved `UseCtxProfile` to `Analysis`.	2024-08-08 17:04:00 -07:00
Aiden Grossman	967185eeb8	Revert "[ctx_prof] Fix the pre-thinlink "use" case (#102511 )" This reverts commit 1a6d60e0162b3ef767c87c95512dd453bf4f4746. Broke some buildbots.	2024-08-08 21:14:56 +00:00
Mircea Trofin	1a6d60e016	[ctx_prof] Fix the pre-thinlink "use" case (#102511 ) Didn't notice in #101338 that the instrumentation in `llvm/test/Transforms/PGOProfile/ctx-prof-use-prelink.ll` was actually incorrect.	2024-08-08 16:45:04 -04:00
Mircea Trofin	dbbf0762b6	[ctx_prof] CtxProfAnalysis (#102084 ) This is an immutable analysis that loads and makes the contextual profile available to other passes. This patch introduces the analysis and an analysis printer pass. Subsequent patches will introduce the APIs that IPO passes will call to modify the profile as result of their changes.	2024-08-07 14:39:48 -04:00
Mircea Trofin	ba4da5a087	[ctx_prof] "Use" support for pre-thinlink. (#101338 ) There is currently no plan to support contextual profiling use in a non- ThinLTO scenario. In the pre-link phase, we only instrument and then immediately bail out to let the linker group functions under an entrypoint in the same module as the entrypoint. We don't actually care what the profile contains - just that we want to use a contextual profile. After that, in post-thinlink, we require the profile be passed again so we can actually use it. The earlier instrumentation will be used to match counter values. While the feature is in development, we add a hidden flag for the use scenario, but we can eventually tie it to the `PGOOptions` mechanism. We will use the same flag in both pre- and post-thinlink, because it simplifies things - usually the post-thinlink args are the same as the ones for pre-. This, despite the flag being basically treated as a boolean in pre-thinlink.	2024-08-02 20:51:27 -04:00
Wei Wang	3a9ef4e69a	[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#100205 ) This is re-land of #90310 after making asan skip pre-split coroutines in #99415. Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that CoroElide can happen after callee coroutine is imported into caller's module in ThinLTO.	2024-07-29 17:42:01 -07:00
Joseph Huber	8758091a70	[LLVM] Add 'ExpandVariadicsPass' to LTO default pipeline (#100479 ) Summary: This pass expands variadic functions into non-variadic function calls according to the target ABI. Currently, this is used as the lowering for the NVPTX and AMDGPU targets. This pass is currently only run late in the target's backend. However, during LTO we want to run it before the inliner pass so that the expanded functions can be inlined using standard heuristics. This pass is a no-op for unsupported targets, so this won't apply to any code that isn't already using it.	2024-07-25 09:21:05 -05:00
Tianqing Wang	3d494bfc7f	[SimplifyCFG] Increase budget for FoldTwoEntryPHINode() if the branch is unpredictable. (#98495 ) The `!unpredictable` metadata has been present for a long time, but it's usage in optimizations is still limited. This patch teaches `FoldTwoEntryPHINode()` to be more aggressive with an unpredictable branch to reduce mispredictions. A TTI interface `getBranchMispredictPenalty()` is added to distinguish between different hardwares to ensure we don't go too far for simpler cores. For simplicity, only a naive x86 implementation is included for the time being.	2024-07-23 07:47:21 +08:00
xur-llvm	b1ca2a9546	[PGO] Sampled instrumentation in PGO to speed up instrumentation binary (#69535 ) In comparison to non-instrumented binaries, PGO instrumentation binaries can be significantly slower. For highly threaded programs, this slowdown can reach 10x due to data races or false sharing within counters. This patch incorporates sampling into the PGO instrumentation process to enhance the speed of instrumentation binaries. The fundamental concept is similar to the one proposed in https://reviews.llvm.org/D63949. Three sampling modes are introduced: 1. Simple Sampling: When '-sampled-instr-bust-duration' is set to 1. 2. Fast Burst Sampling: When not using simple sampling, and '-sampled-instr-period' is set to 65535. This is the default mode of sampling. 3. Full Burst Sampling: When neither simple nor fast burst sampling is used. Utilizing this sampled instrumentation significantly improves the binary's execution speed. Measurements show up to 5x speedup with default settings. Fast burst sampling now results in only around 20% to 30% slowdown (compared to 8 to 10x slowdown without sampling). Out tests show that profile quality remains good with sampling, with edge counts typically showing more than 90% overlap. For applications whose behavior changes due to binary speed, sampling instrumentation can enhance performance. Observations have shown some apps experiencing up to a ~2% improvement in PGO. A potential drawback of this patch is the increased binary size and compilation time. The Sampling method in this patch does not improve single threaded program instrumentation binary speed.	2024-07-22 09:19:17 -07:00
YAMAMOTO Takashi	5d79110959	[Pipelines] Perform mergefunc after constmerge (#92498 ) Constmerge can fold switch jump tables, possibly making functions identical again. It can help mergefunc. On the other hand, the opposite seems unlikely. Fixes https://github.com/llvm/llvm-project/issues/92201.	2024-07-05 12:28:03 +02:00
Egor Pasko	cab81dd038	[EntryExitInstrumenter] Move passes out of clang into LLVM default pipelines (#92171 ) Move EntryExitInstrumenter(PostInlining=true) to as late as possible and EntryExitInstrumenter(PostInlining=false) to an early pre-inlining stage (but skip for ThinLTO post-link). This should fix the issues reported in https://github.com/rust-lang/rust/issues/92109 and https://github.com/llvm/llvm-project/issues/52853. These are caused by https://reviews.llvm.org/D97608.	2024-05-31 12:48:45 -07:00
Mircea Trofin	d311a62e2f	[ctx_profile] Decouple ctx instrumentation from PGOOpt (#92445 ) We currently don't support passing files and don't need frontend involvement either.	2024-05-16 13:41:36 -07:00
Mircea Trofin	174cdeced0	[nfc] Clarify when the various PGO instrumentation passes run (#92330 ) The code seems easier to read if it's centered on what the user wants rather than combinations of whatever internal variables.	2024-05-16 12:17:22 -07:00
Reid Kleckner	aa0776de46	Revert "[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#90310 )" and related patches This change is incorrect when thinlto and asan are enabled, and this can be observed by adding `-fsanitize=address` to the provided coro-elide-thinlto.cpp test. It results in the error "Coroutines cannot handle non static allocas yet", and ASan introduces a dynamic alloca. In other words, we must preserve the invariant that CoroSplit runs before ASan. If we move CoroSplit to the post post-link compile stage, ASan has to be moved to the post-link compile stage first. It would also be correct to make CoroSplit handle dynamic allocas so the pass ordering doesn't matter, but sanitizer instrumentation really ought to be last, after coroutine splitting. This reverts commit bafc5f42c0132171287d7cba7f5c14459be1f7b7. This reverts commit b1b1bfa7bea0ce489b5ea9134e17a43c695df5ec. This reverts commit 0232b77e145577ab78e3ed1fdbb7eacc5a7381ab. This reverts commit fb2d3056618e3d03ba9a695627c7b002458e59f0. This reverts commit 1cb33713910501c6352d0eb2a15b7a15e6e18695. This reverts commit cd68d7b3c0ebf6da5e235cfabd5e6381737eb7fe.	2024-05-10 21:28:13 +00:00
Mircea Trofin	96568f3539	[llvm][ctx_profile] Add instrumentation lowering (#90821 ) This adds the instrumentation lowering pass. (Tracking Issue: #89287, RFC referenced there)	2024-05-08 16:49:08 -07:00
Wei Wang	bafc5f42c0	[Pipelines][Coroutines] Tune coroutine passes only for ThinLTO pre-link pipeline (#90690 ) Follow up to #90310, limit the tune up only to ThinLTO pre-link as coroutine passes are not in MonoLTO backend	2024-04-30 21:40:04 -07:00
Wei Wang	cd68d7b3c0	[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#90310 ) Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that CoroElide can happen after callee coroutine is imported into caller's module in ThinLTO.	2024-04-29 10:24:53 -07:00
Arthur Eubanks	947b656add	[PGO] Check that PGOOpt exists before using PGOOpt->ColdOptType (#89139 ) This means that the pass is unusable without some sort of profile. We can revisit this decision later if we want to support running this pass without a profile.	2024-04-18 11:22:10 -07:00
Florian Hahn	0f82469314	[Passes] Run SimpleLoopUnswitch after introducing invariant branches. (#81271 ) IndVars may be able to replace a loop dependent condition with a loop invariant one, but loop-unswitch runs before IndVars, so the invariant check remains in the loop. For an example, consider a read-only loop with a bounds check: https://godbolt.org/z/8cdj4qhbG This patch uses a approach similar to the way extra cleanup passes are run on demand after vectorization (added in acea6e9cfa4c4a0e8678c7). It introduces a new ShouldRunExtraSimpleLoopUnswitch analysis marker, which IndVars can use to indicate that extra unswitching is beneficial. ExtraSimpleLoopUnswitchPassManager uses this analysis to determine whether to run its passes on a loop. Compile-time impact (geomean) ranges from +0.0% to 0.02% https://llvm-compile-time-tracker.com/compare.php?from=138c0beb109ffe47f75a0fe8c4dc2cdabe8a6532&to=19e6e99eeb280d426907ea73a21b139ba7225627&stat=instructions%3Au Compile-time impact (geomean) of unconditionally running SimpleLoopUnswitch ranges from +0.05% - +0.16% https://llvm-compile-time-tracker.com/compare.php?from=138c0beb109ffe47f75a0fe8c4dc2cdabe8a6532&to=2930dfd5accdce2e6f8d5146ae4d626add2065a2&stat=instructions:u Unconditionally running SimpleLoopUnswitch seems to indicate that there are multiple other scenarios where we fail to run unswitching when opportunities remain. Fixes https://github.com/llvm/llvm-project/issues/85551. PR: https://github.com/llvm/llvm-project/pull/81271	2024-04-12 22:07:29 +01:00
lifengxiang1025	e40cabfea4	[MemProf] Match function's summary and definition strictly (#83665 ) Problem description: https://github.com/llvm/llvm-project/pull/81008#issuecomment-1933468520 Solution: https://github.com/llvm/llvm-project/pull/81008#issuecomment-1934192548 (choose plan2)	2024-03-12 11:00:02 +08:00
Paul Kirth	2fef685363	[llvm][loop-rotate] Allow forcing loop-rotation (#82828 ) Many profitable optimizations cannot be performed at -Oz, due to unrotated loops. While this is worse for size (minimally), many of the optimizations significantly reduce code size, such as memcpy optimizations and other patterns found by loop idiom recognition. Related discussion can be found in issue #50308. This patch adds an experimental, backend-only flag to allow loop header duplication, regardless of the optimization level. Downstream consumers can experiment with this flag, and if it is profitable, we can adjust the compiler's defaults accordingly, and expose any useful frontend flags to opt into the new behavior.	2024-02-29 13:46:13 -08:00
Paul Kirth	777ac46ddb	[llvm] Remove pipeline checks for optsize for DFAJumpThreadingPass The pass itself checks whether to apply the optimization based on the minsize attribute, so there isn't much functional benefit to preventing the pass from being added. Gating the pass gets added to the pass pipeline complicates the interaction with -enable-dfa-jump-thread, as well. Reviewers: aeubanks Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/83318	2024-02-28 11:12:13 -08:00
David Spickett	9c5ca6b0ce	Revert "Enable JumpTableToSwitch pass by default (#82546 )" This reverts commit 1069823ce7d154aa8ef87ae5a0fd34b527eca2a0. This has caused second stage timeouts when building Flang on AArch64: https://lab.llvm.org/buildbot/#/builders/179/builds/9442	2024-02-26 13:35:59 +00:00
Alexander Shaposhnikov	1069823ce7	Enable JumpTableToSwitch pass by default (#82546 ) Enable JumpTableToSwitch pass by default. Test plan: ninja check-all	2024-02-22 11:02:47 -08:00
Arthur Eubanks	93cdd1b5cf	[PGO] Add ability to mark cold functions as optsize/minsize/optnone (#69030 ) The performance of cold functions shouldn't matter too much, so if we care about binary sizes, add an option to mark cold functions as optsize/minsize for binary size, or optnone for compile times [1]. Clang patch will be in a future patch. This is intended to replace `shouldOptimizeForSize(Function&, ...)`. We've seen multiple cases where calls to this expensive function, if not careful, can blow up compile times. I will clean up users of that function in a followup patch. Initial version: https://reviews.llvm.org/D149800 [1] https://discourse.llvm.org/t/rfc-new-feature-proposal-de-optimizing-cold-functions-using-pgo-info/56388	2024-02-12 14:52:08 -08:00
Alexander Shaposhnikov	d26b43ff4f	Add JumpTableToSwitch pass (#77709 ) Add a pass to convert jump tables to switches. The new pass replaces an indirect call with a switch + direct calls if all the functions in the jump table are smaller than the provided threshold. The pass is currently disabled by default and can be enabled by -enable-jump-table-to-switch. Test plan: ninja check-all	2024-02-10 01:12:46 -08:00
Paul Kirth	9d476e1e1a	[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061 ) Currently, the UnifiedLTO pipeline seems to have trouble with several LTO features, like SplitLTO units, which means we cannot use important optimizations like Whole Program Devirtualization or security hardening instrumentation like CFI. This patch reverts FatLTO to using distinct pipelines for Full LTO and ThinLTO. It still avoids module cloning, since that was error prone.	2024-01-23 14:04:52 -08:00
Mingming Liu	5ce286849a	[CGProfile] Use callee's PGO name when caller->callee is an indirect call. (#78610 ) - With PGO, indirect call edges are constructed using value profiles, and the profile address is mapped to a function's PGO name. The PGO name is computed using a functions linkage before LTO internalization or global promotion. - With ThinLTO, local functions [could be promoted](`2663d2cb9c/llvm/lib/Transforms/Utils/FunctionImportUtils.cpp (L288)`) to have external linkage; and with [full](`2663d2cb9c/llvm/lib/LTO/LTO.cpp (L1328)`) or [thin](`2663d2cb9c/llvm/lib/LTO/LTO.cpp (L448)`) LTO, global functions could be internalized. Edge construction should use a function's PGO name before its linkage is updated.	2024-01-22 10:36:03 -08:00
Mircea Trofin	1d608fc755	[NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970 ) Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`. A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.	2023-12-10 18:03:08 -08:00
Paul Kirth	cfe1ece833	[clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180 ) https://github.com/llvm/llvm-project/issues/70703 pointed out that cloning LLVM modules could lead to miscompiles when using FatLTO. This is due to an existing issue when cloning modules with labels (see #55991 and #47769). Since this can lead to miscompilation, we can avoid cloning the LLVM modules, which was desirable anyway. This patch modifies the EmbedBitcodePass to no longer clone the module or run an input pipeline over it. Further, it make FatLTO always perform UnifiedLTO, so we can still defer the Thin/Full LTO decision to link-time. Lastly, it removes dead/obsolete code related to now defunct options that do not work with the EmbedBitcodePass implementation any longer.	2023-11-30 17:09:34 -08:00
Tom Stellard	2750a22745	Passes: Consolidate EnableKnowledgeRetention declarations into a header file (#71695 )	2023-11-13 11:03:49 -08:00
dewen	3b82336188	Revert "[PM] Execute IndVarSimplifyPass precede RessociatePass" (#71617 ) Reverts llvm/llvm-project#71054	2023-11-08 09:22:55 +08:00
dewen	e4d27d7f32	[PM] Execute IndVarSimplifyPass precede RessociatePass (#71054 ) ReassociatePass may clear nsw/nuw flags of some instructions, which may have side effects on optimizations in IndVarSimplifyPass.	2023-11-08 09:21:17 +08:00
Teresa Johnson	87f5e22987	[MemProf] Tolerate missing leaf debug frames (#71233 ) Loosen up the matching so that a missing leaf debug frame in the profile does not prevent matching an allocation context if we can match further up the inlined call context. This relies on the pre-inliner, which was already the default when performing normal PGO feedback along with the MemProf feedback, but to ensure matching is not affected by the presence of PGO, enable the pre-inliner for MemProf feedback as well.	2023-11-03 21:01:07 -07:00
Nikita Popov	a682a9cfd0	Revert "Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235 )" This reverts commit 19b5495b653a00da7a250f48b4f739fcf2bbe82f. PR landed without approval, with severe quality issues.	2023-11-03 21:15:46 +01:00
Manman Ren	19b5495b65	Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235 ) See RFC for details: https://discourse.llvm.org/t/rfc-for-moving-swift-s-merge-function-pass-to-llvm/73778 We will need to refactor extension to FunctionComparator/FunctionHash to StructuralHash. This patch adds a new pass which is ported from Swift, and will need to discuss on how to migrate Swift’s pass over after we land this in llvm. Create this PR to get some early review on the patch. --------- Co-authored-by: Manman Ren <mren@meta.com>	2023-11-03 11:13:58 -07:00
Amara Emerson	1a2e77cf9e	Revert "Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner."" This reverts commit 86bfeb906e3a95ae428f3e97d78d3d22a7c839f3. This is a long time coming re-application that was originally reverted due to regressions, unrelated to the actual inlining change. These regressions have since been fixed due to another long-in-the-making change of a66051c6 landing. Original commit message for reference: --- We have several situations where it's beneficial for code size to ensure that every call to always-inline functions are inlined before normal inlining decisions are made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this, it only does it on a per-SCC basis, rather than the whole module. Ensuring that all mandatory inlinings are done before any heuristic based decisions are made just makes sense. Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0. This also fixes an exponential compile time blow up in https://github.com/llvm/llvm-project/issues/59126 Differential Revision: https://reviews.llvm.org/D143624 ---	2023-10-28 23:21:11 -07:00
Alex Voicu	0ce6255a50	[HIP][LLVM][Opt] Add LLVM support for `hipstdpar` This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional: 1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform: - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail); - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them; - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots; - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module; 2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform: - Iterate all functions in a Module; - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available; - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time; - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter. Reviewed by: JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D155856	2023-10-12 11:26:48 +01:00
Alex Voicu	25935c384d	Revert "[HIP][LLVM][Opt] Add LLVM support for `hipstdpar`" This reverts commit c5bba7ea5a05f540948f76a189c880eb24a5e8c6.	2023-10-11 12:27:03 +01:00
Alex Voicu	c5bba7ea5a	[HIP][LLVM][Opt] Add LLVM support for `hipstdpar` This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional: 1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform: - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail); - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them; - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots; - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module; 2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform: - Iterate all functions in a Module; - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available; - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time; - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter. Reviewed by: JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D155856	2023-10-11 12:22:00 +01:00
Fangrui Song	2d854dd3e7	Move global namespace cl::opt inside llvm:: or internalize them	2023-10-10 19:58:03 -07:00
Alex Voicu	98eda5dda7	Revert "[HIP][LLVM][Opt] Add LLVM support for `hipstdpar`" in order to address build breakage. This reverts commit 9b98ebb0eb43b005921926a622177f10e13b1ac6.	2023-10-10 12:16:10 +01:00
Alex Voicu	9b98ebb0eb	[HIP][LLVM][Opt] Add LLVM support for `hipstdpar` This patch adds the LLVM changes needed for enabling HIP parallel algorithm offload on AMDGPU targets. What we do here is add two passes, one mandatory and one optional: 1. HipStdParAcceleratorCodeSelectionPass is mandatory, depends on CallGraphAnalysis, and implements the following transform: - Traverse the call-graph, and check for functions that are roots for accelerator execution (at the moment, these are GPU kernels exclusively, and would originate in the accelerator specific algorithm library the toolchain uses as an implementation detail); - Starting from a root, do a BFS to find all functions that are reachable (called directly or indirectly via a call- chain) and record them; - After having done the above for all roots in the Module, we have the computed the set of reachable functions, which is the union of roots and functions reachable from roots; - All functions that are not in the reachable set are removed; for the special case where the reachable set is empty we completely clear the module; 2. HipStdParAllocationInterpositionPass is optional, is meant as a fallback with restricted functionality for cases where on-demand paging is unavailable on a platform, and implements the following transform: - Iterate all functions in a Module; - If a function's name is in a predefined set of allocation / deallocation that the runtime implementation is allowed and expected to interpose, replace all its uses with the equivalent accelerator aware function, iff the latter is available; - If the accelerator aware equivalent is unavailable we warn, but compilation will go ahead, which means that it is possible to get issues around the accelerator trying to access inaccessible memory at run time; - We rely on direct name matching as opposed to using the new alloc-kind family of attributes and / or the LibCall analysis pass because some of the legacy functions that need replacing would not carry the former or be identified by the latter. Reviewed by: JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D155856	2023-10-10 12:02:05 +01:00
lcvon007	f3c417f341	[Passes] Add option for LoopVersioningLICM pass. (#67107 ) User only can use opt to test LoopVersioningLICM pass, and this PR add the option back(deleted in https://reviews.llvm.org/D137915) so that it's easy for verifying if it is useful for some benchmarks.	2023-09-27 07:38:37 -05:00
Florian Hahn	04f9a8a7d6	[ConstraintElim] Move just before loop simplification pipeline. Adjust the pipeline slightly to move ConstraintElim just before the loop simplification pipeline. This increases the number of cases where SCEV should can preserved in the future. This also enables slightly more opportunities, by benefiting from earlier CFG simplifications, which allow more conditions to be added. Reviewed By: nikic, antoniofrighetto Differential Revision: https://reviews.llvm.org/D158843	2023-09-22 14:31:08 +01:00
Dhruv Chawla	515a826326	[NFC][InferAlignment] Swap extern declaration and definition of EnableInferAlignmentPass This prevents a linker issue when only InstCombine is linked without PassBuilder, like in the case of bugpoint.	2023-09-20 13:07:13 +05:30
Dhruv Chawla	3e992d81af	[InferAlignment] Enable InferAlignment pass by default This gives an improvement of 0.6%: https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u Differential Revision: https://reviews.llvm.org/D158600	2023-09-20 12:08:52 +05:30
Dhruv Chawla	0f152a55d3	[InferAlignment] Implement InferAlignmentPass This pass aims to infer alignment for instructions as a separate pass, to reduce redundant work done by InstCombine running multiple times. It runs late in the pipeline, just before the back-end passes where this information is most useful. Differential Revision: https://reviews.llvm.org/D158529	2023-09-20 12:03:36 +05:30
Nuno Lopes	281ae4903d	[Pipelines] Guard a few more usages of GlobalsAA under the EnableGlobalAnalyses flag	2023-09-07 13:58:28 +01:00
Qiongsi Wu	611ce24114	[PGO] Enable `-fprofile-update` for `-fprofile-generate` Currently, the `-fprofile-udpate` is ignored when `-fprofile-generate` is in effect. This patch enables `-fprofile-update` for `-fprofile-generate`. This patch continues the work from https://reviews.llvm.org/D87737, which added `-fprofile-update` in the first place. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D157280	2023-08-15 10:10:03 -04:00

1 2 3 4

189 Commits