llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-29 08:06:06 +00:00

Author	SHA1	Message	Date
Rong Xu	666731660c	[Pass][CHR] Move ControlHeightReduction to module optimization pipeline This is a modified version of commit b374423304a8 by Arthur (https://reviews.llvm.org/D143424). Here we invoke to the pass independent of PGOOPT. We now check if the profile is available through the program summary. This ensures CHR is called in distributed ThinLTO BE compilation (where PGOOPT might not be created). Differential Revision: https://reviews.llvm.org/D144769	2023-02-27 11:47:54 -08:00
Arthur Eubanks	a628ca4925	Revert "[Pipeline] Move ControlHeightReduction to module optimization pipeline" This reverts commit b374423304a8d91d590d0ce5ab1b381296d6dfb2. Causes regressions on some benchmarks.	2023-02-23 10:17:12 -08:00
Arthur Eubanks	b374423304	[Pipeline] Move ControlHeightReduction to module optimization pipeline This pass isn't a simplification, it's a non-canonical optimization. This makes it only run once in a (Thin)LTO pipeline during postlink, just like all the other optimization pipeline passes. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D143424	2023-02-16 15:23:38 -08:00
Arthur Eubanks	5f5cf60298	[Pipeline] Remove -enable-npm-O3-nontrivial-unswitch flag This was added to help debugging performance issues, no longer needed. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98675	2023-02-16 11:36:03 -08:00
Arthur Eubanks	4d16ebd6c9	[Pipeline] Remove -enable-no-rerun-simplification-pipeline flag This has been on without complaint for a while. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D144130	2023-02-16 11:29:51 -08:00
David Green	86bfeb906e	Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner." This seems to cause large regressions in existing code, as much as 75% slower (4x the time taken). Small always inline functions seem to be used a lot in the cmsis-dsp library. I would add a phase ordering test to show the problems, but one already exists! The llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll was just changed by removing alwaysinline to hide the problems that existed. This reverts commit cae033dcf227aeecf58fca5af6fc7fde1fd2fb4f. This reverts commit 8e33c41e72ad42e4c27f8cbc3ad2e02b169637a1.	2023-02-10 15:01:49 +00:00
Amara Emerson	8e33c41e72	Inliner: Address missed review comments for D143624	2023-02-09 21:56:40 -08:00
Amara Emerson	cae033dcf2	Inlining: Run the legacy AlwaysInliner before the regular inliner. We have several situations where it's beneficial for code size to ensure that every call to always-inline functions are inlined before normal inlining decisions are made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this, it only does it on a per-SCC basis, rather than the whole module. Ensuring that all mandatory inlinings are done before any heuristic based decisions are made just makes sense. Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0. This also fixes an exponential compile time blow up in https://github.com/llvm/llvm-project/issues/59126 Differential Revision: https://reviews.llvm.org/D143624	2023-02-09 16:49:29 -08:00
Florian Hahn	8028263c41	Recommit "[ConstraintElim] Enable pass by default." This reverts commit 695ce48c63ec582a46bfbda9b066f4d3bcde143f. The compile-time regression causing the revert has been fixed. Recommit the original patch. Original commit message: The pass should help to close a functional gap when it comes to reasoning about related conditions in a relatively general way. It addresses multiple existing issues (linked below) and the need for a more powerful reasoning system was also discussed recently in https://discourse.llvm.org/t/rfc-alternative-approach-of-dealing-with-implications-from-comparisons-through-pos-analysis/65601/7 On AArch64, the new pass performs ~2000 simplifications on MultiSource,SPEC2006,SPEC2017 with -O3. Compile-time impact: NewPM-O3: +0.20% NewPM-ReleaseThinLTO: +0.32% NewPM-ReleaseLTO-g: +0.28% https://llvm-compile-time-tracker.com/compare.php?from=f01a3a893c147c1594b9a3fbd817456b209dabbf&to=577688758ef64fb044215ec3e497ea901bb2db28&stat=instructions:u Fixes #49344. Fixes #47888. Fixes #48253. Fixes #49229. Fixes #58074. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135915	2023-02-06 18:09:43 +00:00
Steven Wu	516e301752	[NFC][Profile] Access profile through VirtualFileSystem Make the access to profile data going through virtual file system so the inputs can be remapped. In the context of the caching, it can make sure we capture the inputs and provided an immutable input as profile data. Reviewed By: akyrtzi, benlangmuir Differential Revision: https://reviews.llvm.org/D139052	2023-02-01 09:25:02 -08:00
Arthur Eubanks	4ce34bb2a9	[CGSCC] Add pass which counts the max number of times we visit a function This will help with finding potential pathological CGSCC cases. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D142853	2023-01-30 10:06:53 -08:00
Joseph Huber	6185246f4f	[OpenMP] Run an extra 'OpenMPOpt' pass in LTO-mode The `OpenMPOpt` pass is pivotal to the performance of many OpenMP offloading programs. When we perform non-LTO builds with OpenMP we used to link the OpenMP deviceRTL individually for each TU. This lead to us getting an additional attributor run on the combined runtime and user code. When we used LTO we lost a run and suffered a large performance degradation. This patch simply adds in the extra `OpenMPOpt` pass that we miss into the LTO pipeline. This patch fixes the performance regression shown in applications that used OpenMP offloading in LTO mode. Previously, this wasn't legal to do as we could emit new runtime calls into the module. That was fixed by D142646. Depends on D142646 Fixes https://github.com/llvm/llvm-project/issues/60300 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142650	2023-01-26 13:23:45 -06:00
Joseph Huber	0bdde9dfb9	[OpenMP] Make OpenMPOpt aware of the OpenMP runtime's status The `OpenMPOpt` pass contains optimizations that generate new calls into the OpenMP runtime. This causes problems if we are in a state where the runtime has already been linked statically. Generating these new calls will result in them never being resolved. We should indicate if we are in a "post-link" LTO phase and prevent OpenMPOpt from generating new runtime calls. Generally, it's not desireable for passes to maintain state about the context in which they're called. But this is the only reasonable solution to static linking when we have a pass that generates new runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142646	2023-01-26 13:23:44 -06:00
Florian Hahn	695ce48c63	Revert "[ConstraintElim] Enable pass by default." This reverts commit fb13dcf3431cd83911fe56899d2fade808dc5b8d. A large compile-time regression for code generated by sanitizers has been reported. Revert while I investigate the issue. Details and reproducers are available here: https://reviews.llvm.org/D135915	2023-01-18 14:25:00 +00:00
Alexandros Lamprineas	572a757fa7	[IPSCCP] Enable specialization of functions. Re-enable the optimization after having fixed the compilation error found in SPEC/CINT2017rate/502.gcc_r when both LTO and PGO are in use (see https://reviews.llvm.org/D141474). Differential Revision: https://reviews.llvm.org/D140210	2023-01-13 14:04:17 +00:00
Florian Hahn	fb13dcf343	[ConstraintElim] Enable pass by default. The pass should help to close a functional gap when it comes to reasoning about related conditions in a relatively general way. It addresses multiple existing issues (linked below) and the need for a more powerful reasoning system was also discussed recently in https://discourse.llvm.org/t/rfc-alternative-approach-of-dealing-with-implications-from-comparisons-through-pos-analysis/65601/7 On AArch64, the new pass performs ~2000 simplifications on MultiSource,SPEC2006,SPEC2017 with -O3. Compile-time impact: NewPM-O3: +0.20% NewPM-ReleaseThinLTO: +0.32% NewPM-ReleaseLTO-g: +0.28% https://llvm-compile-time-tracker.com/compare.php?from=f01a3a893c147c1594b9a3fbd817456b209dabbf&to=577688758ef64fb044215ec3e497ea901bb2db28&stat=instructions:u Fixes #49344. Fixes #47888. Fixes #48253. Fixes #49229. Fixes #58074. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135915	2023-01-04 18:00:37 +00:00
Florian Hahn	f3c1d92682	[ConstraintElim] Adjust position in LTO pipeline. This runs ConstraintElim earlier during LTO, similar to non-LTO. Discussed and split off from D135915.	2023-01-03 17:07:43 +00:00
Florian Hahn	9e6d2c82d6	[ConstraintElim] Move after first instcombine run. Running ConstraintEliminiation after the first InstCombine run results in slightly more simplifications on average. There are is a tiny number of regressions, mostly due to CVP eliminating a condition that ConstraintElimination would use, but in most cases there's a slight improvement or no change. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D140853	2023-01-03 13:25:00 +00:00
Florian Hahn	60359f56aa	Revert "[IPSCCP] Enable specialization of functions." This reverts commit 2656572d485127cc30b8fe9752024d2a0f1c50db. It looks like CINT2017rate/502.gcc_r gets mis-compiled with LTO + PGO on AArch64 with function specialization.	2022-12-26 16:02:59 +00:00
Alexandros Lamprineas	2656572d48	[IPSCCP] Enable specialization of functions. This patch enables Function Specialization by default at all optimization levels except Os, Oz. Compilation Time Overhead: -------------------------- Measured the Instruction Count increase (Geomean) for CTMark from the llvm-testsuite as in https://llvm-compile-time-tracker.com. * {-O3, Non-LTO}: +0.136% Instruction Count * {-O3, LTO}: +0.346% Instruction Count Performance Uplift: ------------------- Measured +9.121% score increase for 505.mcf_r from SPEC Int 2017 (Tested on Neoverse N1 with -O3 + LTO) Correctness Testing: -------------------- * Passes bootstrap Clang with ASAN + LTO + FuncSpec aggressive options: { MaxClonesThreshold=10, SmallFunctionThreshold=10, AvgLoopIterationCount=30, SpecializeOnAddresses=true, EnableSpecializationForLiteralConstant=true, FuncSpecializationMaxIters=10 } * Builds Chromium and passes its unittests with the above options + ThinLTO. For more info please refer to https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518 Differential Revision: https://reviews.llvm.org/D140210	2022-12-25 10:05:21 +02:00
Alexandros Lamprineas	8136a0172b	[FuncSpec] Make the Function Specializer part of the IPSCCP pass. Reland 877a9f9abec61f06e39f1cd872e37b828139c2d1 since D138654 (parent) has been fixed with 9ebaf4fef4aac89d4eff08e48185d61bc893f14e and with 8f1e11c5a7d70f96943a72649daa69f152d73e90. Differential Revision: https://reviews.llvm.org/D126455	2022-12-10 14:39:49 +00:00
Roman Lebedev	4f7e5d2206	[SROA] For non-speculatable `load`s of `select`s -- split block, insert then/else blocks, form two-entry PHI node, take 2 Currently, SROA is CFG-preserving. Not doing so does not affect any pipeline test. (???) Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call. By design, we can't really SROA alloca if their address escapes somehow, but we have logic to deal with `load` of `select`/`PHI`, where at least one of the possible addresses prevents promotion, by speculating the `load`s and `select`ing between loaded values. As one would expect, that requires ensuring that the speculation is actually legal. Even ignoring complexity bailouts, that logic does not deal with everything, e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`. There can also be cases where the load is genuinely non-speculate. So if we can't prove that the load can be speculated, unfold the select, produce two-entry phi node, and perform predicated load. Now, that transformation must obviously update Dominator Tree, since we require it later on. Doing so is trivial. Additionally, we don't want to do this for the final SROA invocation (D136806). In the end, this ends up having negative (!) compile-time cost: https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u Though indeed, this only deals with `select`s, `PHI`s are still using speculation. Should we update some more analysis? Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138238 This reverts commit 739611870d3b06605afe25cc07833f6a62de9545, and recommits 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5 with a fixed assertion - we should check that DTU is there, not just assert false...	2022-12-08 20:19:55 +03:00
Roman Lebedev	739611870d	Revert "[SROA] For non-speculatable `load`s of `select`s -- split block, insert then/else blocks, form two-entry PHI node" The assertion about not modifying the CFG seems to not hold, will recommit in a bit. https://lab.llvm.org/buildbot#builders/139/builds/32412 This reverts commit 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5. This reverts commit 4f90f4ada33718f9025d0870a4fe3fe88276b3da.	2022-12-08 19:51:15 +03:00
Roman Lebedev	03e6d9d9d1	[SROA] For non-speculatable `load`s of `select`s -- split block, insert then/else blocks, form two-entry PHI node Currently, SROA is CFG-preserving. Not doing so does not affect any pipeline test. (???) Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call. By design, we can't really SROA alloca if their address escapes somehow, but we have logic to deal with `load` of `select`/`PHI`, where at least one of the possible addresses prevents promotion, by speculating the `load`s and `select`ing between loaded values. As one would expect, that requires ensuring that the speculation is actually legal. Even ignoring complexity bailouts, that logic does not deal with everything, e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`. There can also be cases where the load is genuinely non-speculate. So if we can't prove that the load can be speculated, unfold the select, produce two-entry phi node, and perform predicated load. Now, that transformation must obviously update Dominator Tree, since we require it later on. Doing so is trivial. Additionally, we don't want to do this for the final SROA invocation (D136806). In the end, this ends up having negative (!) compile-time cost: https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u Though indeed, this only deals with `select`s, `PHI`s are still using speculation. Should we update some more analysis? Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138238	2022-12-08 16:51:32 +03:00
Alexandros Lamprineas	0f0cb92cb2	Revert "[FuncSpec] Make the Function Specializer part of the IPSCCP pass." This reverts commit 877a9f9abec61f06e39f1cd872e37b828139c2d1. It depends on the parent revision 42c2dc401742266da3e0251b6c1ca491f4779963 which needs to be reverted as it broke some buildbots, so reverting both.	2022-12-08 12:41:43 +00:00
Alexandros Lamprineas	877a9f9abe	[FuncSpec] Make the Function Specializer part of the IPSCCP pass. The aim of this patch is to minimize the compilation time overhead of running Function Specialization. It is about 40% slower to run as a standalone pass (IPSCCP + FuncSpec vs IPSCCP with FuncSpec) according to my measurements. I compiled the llvm testsuite with NewPM-O3 + LTO and measured single threaded [user + system] time of IPSCCP and FuncSpec by passing the '-time-passes' option to lld. Then I compared the two configurations in terms of Instruction Count of the total compilation (not of the individual passes) as in https://llvm-compile-time-tracker.com. Geomean for non-LTO builds is -0.25% and LTO is -0.5% approximately. You can find more info below: https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518 Differential Revision: https://reviews.llvm.org/D126455	2022-12-08 12:14:27 +00:00
Sjoerd Meijer	8250180238	Revert "Recommit "[LoopFlatten] Enable it by default"" This reverts commit 3ea6a9a469fde168c527b1c34c09f6d684ec86af because of the reported miscompilation in: https://github.com/llvm/llvm-project/issues/59339	2022-12-05 15:14:12 +00:00
Sjoerd Meijer	3ea6a9a469	Recommit "[LoopFlatten] Enable it by default" The problem in 58441 that was reported after enabling this last time was fixed in 8e9e22f07bcbe2ee95478684cf31948370e4e51e.	2022-11-29 10:45:13 +00:00
Rong Xu	6327d263f5	[CHR] Add a threshold for the code duplication ControlHeightReduction (CHR) clones the code region to reduce the branches in the hot code path. The number of clones is linear to the depth of the region. Currently it does not have control over the code size increase. We are seeing one ~9000 BB functions get expanded to ~250000 BBs, an 25x increase. This creates a big compile time issue for the downstream optimizations. This patch adds a cap for number of clones for one region. Differential Revision: https://reviews.llvm.org/D138333	2022-11-22 11:36:40 -08:00
Sanjay Patel	163bb6d64e	[Passes][VectorCombine] enable early run generally and try load folds An early run of VectorCombine was added with D102496 specifically to deal with unnecessary vector ops produced with the C matrix extension. This patch is proposing to try those folds in general and add a pair of load folds to the menu. The load transform will partly solve (see PhaseOrdering diffs) a longstanding vectorization perf bug by removing redundant loads via GVN: issue #17113 The main reason for not enabling the extra pass generally in the initial patch was compile-time cost. The cost of VectorCombine was significantly (surprisingly) improved with: 87debdadaf18 https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u ...so the extra run is going to cost very little now - the total cost of the 2 runs should be less than the 1 run before that micro-optimization: https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u It may be possible to reduce the cost slightly more with a few more earlier-exits like that, but it's probably in the noise based on timing experiments. Differential Revision: https://reviews.llvm.org/D138353	2022-11-21 13:57:55 -05:00
Sanjay Patel	8f337f8ffe	[VectorCombine] generalize pass param name for early combines; NFC The option was added with https://reviews.llvm.org/D102496, and currently the name is accurate, but I am hoping to add a load transform that is not a scalarization. See issue #17113.	2022-11-21 13:57:55 -05:00
Roman Lebedev	8adfa29706	[Pipelines] Introduce SROA after (final, run-time) loop unrolling Now that we are done with loop unrolling, be it either by LoopVectorizer, or LoopUnroll passes, some variable-offset GEP's into alloca's could have become constant-offset, thus enabling SROA and alloca promotion, yet we don't capitalize on that, which is surprizing. While it would be good to not introduce one more SROA invocation, but instead move the one from `PassBuilder::buildFunctionSimplificationPipeline()`, the existing test coverage says that is a bad idea, though it would be fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=b150d34c47efbd8fa09604bce805c0920360f8d7&to=5a9a5c855158b482552be8c7af3e73d67fa44805&stat=instructions So instead, i add yet another SROA run. I have checked, and it needs to be at least after said final loop unrolling. This is still fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=70324cd88328c0924e605fa81b696572560aa5c9&to=fb489bbef687ad821c3173a931709f9cad9aee8a&stat=instructions I've encountered this in a real code, `SROA-after-final-loop-unrolling.ll` has been reduced from https://godbolt.org/z/fsdMhETh3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D136806	2022-11-17 21:31:30 +03:00
Arthur Eubanks	cbcf123af2	[LegacyPM] Remove cl::opts controlling optimization pass manager passes Move these to the new PM if they're used there. Part of removing the legacy pass manager for optimization pipeline. Reland with UseNewGVN usage in clang removed. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D137915	2022-11-14 09:38:17 -08:00
Arthur Eubanks	d7c1427953	Revert "[LegacyPM] Remove cl::opts controlling optimization pass manager passes" This reverts commit 7ec05fec7115a910b2e172de794adc462388c25e. Breaks bots, e.g. https://lab.llvm.org/buildbot#builders/217/builds/15008	2022-11-14 09:33:38 -08:00
Arthur Eubanks	7ec05fec71	[LegacyPM] Remove cl::opts controlling optimization pass manager passes Move these to the new PM if they're used there. Part of removing the legacy pass manager for optimization pipeline. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D137915	2022-11-14 09:23:17 -08:00
Arthur Eubanks	4fa328074e	[NewPM][Pipeline] Add PipelineTuningOption to set inliner threshold The legacy PM allowed you to set a custom inliner threshold via builder.Inliner = llvm::createFunctionInliningPass(inline_threshold); This allows the same thing to be done with the new PM optimization pipelines. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D137038	2022-11-02 10:47:51 -07:00
Paul Walker	ab8257ca0e	[NFC] Fix a few whitespace inconsistencies.	2022-10-20 14:52:25 +00:00
Pavel Samolysov	1c530500ab	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-09-22 15:33:46 -07:00
Nuno Lopes	d953d01737	Introduce -enable-global-analyses to allow users to disable inter-procedural analyses Alive2 doesn't support verification of optimizations that use inter-procedural analyses. Right now, clang uses GlobalsAA by default and there's no way to disable it. This leads to Alive2 producing false positives. The added flag allows us to skip global analyses altogether. Differential Revision: https://reviews.llvm.org/D134139	2022-09-19 11:59:35 +01:00
Vitaly Buka	181d408186	[pipelines] OptimizerEarlyEPCallbacks for ThinLTO prelink Similar to OptimizerLastEPCallbacks workaround added D96320. Probably NFC as-is, I don't see anything hooked with this callbacks yet, but I we are looking to move sanitizers. Reviewed By: aeubanks, MaskRay Differential Revision: https://reviews.llvm.org/D133333	2022-09-06 15:54:04 -07:00
Arthur Eubanks	9599393eeb	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" This reverts commit b10a341aa5b0b93b9175a8f11efc9a0955ab361e. This commit exposes the pre-existing https://github.com/llvm/llvm-project/issues/56503 in some edge cases. Will fix that and then reland this.	2022-09-01 08:52:19 -07:00
Pavel Samolysov	b10a341aa5	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-28 10:47:03 +03:00
Pavel Samolysov	f964417c32	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" The commit breaks the compiler when a function is used as a function parameter (hm... for a function from the standard C library?): ``` static float strtof(char , char ) {} void a() { strtof(a, 0); } ``` This reverts commit 879f5118fc74657e4a5c4eff6810098e1eed75ac.	2022-08-26 13:43:09 +03:00
Pavel Samolysov	879f5118fc	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-25 10:55:47 +03:00
Pavel Samolysov	6703ad1e0c	Revert "[Pipelines] Introduce DAE after ArgumentPromotion" This reverts commit 3f20dcbf708cb23f79c4866d8285a8ae7bd885de.	2022-08-24 12:44:13 +03:00
Pavel Samolysov	3f20dcbf70	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-08-24 10:36:12 +03:00
Ellis Hoag	0f946a50a4	[InstrProf] Add option to disable loop opt after PGO Add the `-enable-post-pgo-loop-rotation` option to enable or disable the loop rotation transformation [1]. With some instrumentations, e.g., function entry coverage [2], loop rotation is not necessary and can lead to some surprise differences in codegen, even for functions where instrumentation is blocked with `noprofile` or `skipprofile`. The default value is `true` so the default behavior does not change. [1] https://www.llvm.org/docs/LoopTerminology.html#loop-terminology-loop-rotate [2] https://reviews.llvm.org/D116180 Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D131817	2022-08-17 12:23:18 -07:00
Sanjay Patel	bfb9b8e075	[Passes] add a tail-call-elim pass near the end of the opt pipeline We call tail-call-elim near the beginning of the pipeline, but that is too early to annotate calls that get added later. In the motivating case from issue #47852, the missing 'tail' on memset leads to sub-optimal codegen. I experimented with removing the early instance of tail-call-elim instead of just adding another pass, but that appears to be slightly worse for compile-time: +0.15% vs. +0.08% time. "tailcall" shows adding the pass; "tailcall2" shows moving the pass to later, then adding the original early pass back (so 1596886802 is functionally equivalent to 180b0439dc ): https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&remote=rotateright Note that there was an effort to split the tail call functionality into 2 passes - that could help reduce compile-time if we find that this change costs more in compile-time than expected based on the preliminary testing: D60031 Differential Revision: https://reviews.llvm.org/D130374	2022-07-25 15:25:47 -04:00
Alina Sbirlea	846d10f16a	Turn on flag to not re-run simplification pipeline. This patch turns on the flag `-enable-no-rerun-simplification-pipeline`, which means the simplification pipeline will not be rerun on unchanged functions in the CGSCCPass Manager. Compile time improvement: https://llvm-compile-time-tracker.com/compare.php?from=17457be1c393ff691cca032b04ea1698fedf0301&to=882301ebb893c8ef9f09fe1ea871f7995426fa07&stat=instructions No meaningful run time regressions observed in the llvm test suite and in additional internal workloads at this time. The example test in `test/Other/no-rerun-function-simplification-pipeline.ll` is a good means to understand the effect of this change: ``` define void @f1(void()* %p) alwaysinline { call void %p() ret void } define void @f2() #0 { call void @f1(void()* @f2) call void @f3() ret void } define void @f3() #0 { call void @f2() ret void } ``` There are two SCCs formed by the ModuleToPostOrderCGSCCAdaptor: (f1) and (f2, f3). The pass manager runs on the first SCC, leading to running the simplification pipeline (function and loop passes) on f1. With the flag on, after this, the output will have `Running analysis: ShouldNotRunFunctionPassesAnalysis on f1`. Next, the pass manager runs on the second SCC: (f2, f3). Since f1() was inlined, f2() now calls itself, and also calls f3(), while f3() only calls f2(). So the pass manager for the SCC first runs the Inliner on (f2, f3), then the simplification pipeline on f2. With the flag on, the output will have `Running analysis: ShouldNotRunFunctionPassesAnalysis on f2`; unless the inliner makes a change, this analysis remains preserved which means there's no reason to rerun the simplification pipeline. With the flag off, there is a second run of the simplification pipeline run on f2. Next, the same flow occurs for f3. The simplification pipeline is run on f3 a single time with the flag on, along with `ShouldNotRunFunctionPassesAnalysis on f3`, and twice with the flag off. The reruns occur only on f2 and f3 due to the additional ref edges.	2022-07-14 06:23:55 -07:00
Kazu Hirata	ec9a0e36d9	[IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC) The last uses were removed on Apr 15, 2022 in commit 2e6ac54cf48aa04f7b05c382c33135b16d3f01ea. Differential Revision: https://reviews.llvm.org/D129460	2022-07-11 20:15:24 -07:00

1 2 3 4

189 Commits