llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-28 02:36:05 +00:00

Author	SHA1	Message	Date
serge-sans-paille	cca01008cc	Move "auto-init" instructions to the dominator of their users As a result of -ftrivial-auto-var-init, clang generates instructions to set alloca'd memory to a given pattern, right after the allocation site. In some cases, this (somehow costly) operation could be delayed, leading to conditional execution in some cases. This is not an uncommon situation: it happens ~500 times on the cPython code base, and much more on the LLVM codebase. The benefit greatly varies on the execution path, but it should not regress on performance. Differential Revision: https://reviews.llvm.org/D137707	2023-04-03 15:27:27 +02:00
ibricchi	1a36eaa552	[Pass Builder] Allow Module Inliner for full LTO Currently there is no way to enable the module inliner when linking with full lto. This patch enables that option. Differential Revision: https://reviews.llvm.org/D146805	2023-04-03 14:16:35 +02:00
Teresa Johnson	700cd99061	Restore "[MemProf] Context disambiguation cloning pass [patch 1a/3]" This restores commit d6ad4f01c3dafcab335bca66dac6e36d9eac8421, which was reverted in commit 883dbb9c86be87593a58ef10b070b3a0564c7fee, along with a fix for gcc 12.2 build errors in the original commit. Support for building, printing, and displaying CallsiteContextGraph which represents the MemProf metadata contexts. Uses CRTP to enable support for both IR (regular LTO) and summary (ThinLTO). This patch includes the support for building it in regular LTO mode (from memprof and callsite metadata), and the next patch will add the handling for building it from ThinLTO summaries. Also includes support for dumping the graph to text and to dot files. Follow-on patches will contain the support for cloning on the graph and in the IR. The graph represents the call contexts in all memprof metadata on allocation calls, with nodes for the allocations themselves, as well as for the calls in each context. The graph is initially built from the allocation memprof metadata (or summary) MIBs. It is then updated to match calls with callsite metadata onto the nodes, updating it to reflect any inlining performed on those calls. Each MIB (representing an allocation's call context with allocation behavior) is assigned a unique context id during the graph build. The edges and nodes in the graph are decorated with the context ids they carry. This is used to correctly update the graph when cloning is performed so that we can uniquify the context for a single (possibly cloned) allocation. Differential Revision: https://reviews.llvm.org/D140908	2023-03-22 10:16:06 -07:00
Nikita Popov	883dbb9c86	Revert "[MemProf] Context disambiguation cloning pass [patch 1a/3]" This reverts commit d6ad4f01c3dafcab335bca66dac6e36d9eac8421. Fails to build on at least gcc 12.2: /home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:482:1: error: no declaration matches ‘ContextNode<DerivedCCG, FuncTy, CallTy>* CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(const CallInfo&)’ 482 \| CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst( \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:393:16: note: candidate is: ‘CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode* CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(const CallInfo&)’ 393 \| ContextNode *getNodeForInst(const CallInfo &C); \| ^~~~~~~~~~~~~~ /home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:99:7: note: ‘class CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>’ defined here 99 \| class CallsiteContextGraph { \| ^~~~~~~~~~~~~~~~~~~~	2023-03-22 15:43:46 +01:00
Teresa Johnson	d6ad4f01c3	[MemProf] Context disambiguation cloning pass [patch 1a/3] Support for building, printing, and displaying CallsiteContextGraph which represents the MemProf metadata contexts. Uses CRTP to enable support for both IR (regular LTO) and summary (ThinLTO). This patch includes the support for building it in regular LTO mode (from memprof and callsite metadata), and the next patch will add the handling for building it from ThinLTO summaries. Also includes support for dumping the graph to text and to dot files. Follow-on patches will contain the support for cloning on the graph and in the IR. The graph represents the call contexts in all memprof metadata on allocation calls, with nodes for the allocations themselves, as well as for the calls in each context. The graph is initially built from the allocation memprof metadata (or summary) MIBs. It is then updated to match calls with callsite metadata onto the nodes, updating it to reflect any inlining performed on those calls. Each MIB (representing an allocation's call context with allocation behavior) is assigned a unique context id during the graph build. The edges and nodes in the graph are decorated with the context ids they carry. This is used to correctly update the graph when cloning is performed so that we can uniquify the context for a single (possibly cloned) allocation. Depends on D140786. Differential Revision: https://reviews.llvm.org/D140908	2023-03-22 07:05:27 -07:00
Nikita Popov	a8f6b5763e	[PassBuilder] Support O0 in default pipelines The default and pre-link pipeline builders currently require you to call a separate method for optimization level O0, even though they have perfectly well-defined O0 optimization pipelines. Accept O0 optimization level and call buildO0DefaultPipeline() internally, so all consumers don't need to repeat this. Differential Revision: https://reviews.llvm.org/D146200	2023-03-17 10:00:05 +01:00
Arthur Eubanks	20ed9cebb6	[Pipeline] Remove early InstCombine in ThinLTO post link sample profile pipeline With opaque pointers, all function pointer types are the same, meaning there should be no bitcasts. Internal benchmarks with SampleFDO look neutral. This was added in D36333. Reviewed By: tejohnson, davidxl Differential Revision: https://reviews.llvm.org/D146099	2023-03-14 19:48:31 -07:00
Alexandros Lamprineas	f242291f59	[FuncSpec] Do not run pre-link when doing LTO. Saves time. Post link will cover most cases anyway. Differential Revision: https://reviews.llvm.org/D145394	2023-03-14 18:56:26 +00:00
Arthur Eubanks	87dadf0f5b	[Pipeline] Move some GlobalOpt/GlobalDCE runs into simplification pipeline These are very clearly more simplification than optimization. Mostly NFC, except for some ordering around passes that don't really matter. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D145967	2023-03-14 09:01:14 -07:00
Nikita Popov	fb5683449e	[Pipelines] Restore old DAE position in LTO pipeline This is a partial revert of D128830, restoring the previous position of DeadArgElim in the fat LTO pipeline. The motivation for this is a major code size regression observed in Rust and illustrated in the PhaseOrdering test. This is a conservative fix restoring the previous pipeline order. The real problem is that the LTO pipeline is conceptually broken: It doesn't have a CGSCC function simplification pipeline. The inliner is just being run by itself. This wouldn't be a problem if fat LTO used a standard design where ArgPromotion and DAE are only run after functions have already been simplified by the CGSCC inliner pipeline. Differential Revision: https://reviews.llvm.org/D146051	2023-03-14 17:00:17 +01:00
Sanjay Patel	ef6f23535d	Revert "[InstCombine] use loop info when running the pass after loop vectorization" This reverts commit 43ae4b62b2671cf73e691c0b53324cd39405cd51. This was intended to be practically NFC in terms of the overall opt pipeline, but there is experimental data showing that code changes occurred here: https://llvm-compile-time-tracker.com/compare.php?from=772aa05452f8ff90a47168e6801cda2acb5a1873&to=43ae4b62b2671cf73e691c0b53324cd39405cd51&stat=size-text	2023-03-11 17:28:56 -05:00
Sanjay Patel	43ae4b62b2	[InstCombine] use loop info when running the pass after loop vectorization This is the follow-up to D144199 and suggestion from D144045. We make use of loop info explicit via InstCombine pass parameter rather than semi-arbitrary via caching. The only InstCombine transform that uses LoopInfo currently is a GEP fold in visitGEPOfGEP(), so that shows up as a failure in the dedicated test for the fold as well as several LoopVectorizer tests that run extra passes. I don't see any pass manager regression tests that actually check for pass options, but this is intended to be NFC for the pass pipeline behavior - we only try to use loop info where it would have been used before via caching . Differential Revision: https://reviews.llvm.org/D144274	2023-03-11 14:20:30 -05:00
Arthur Eubanks	0d4a709bb8	[Pipeline] Adjust PostOrderFunctionAttrs placement in simplification pipeline We can infer more attribute information once functions are fully simplified, so move the PostOrderFunctionAttrs pass after the function simplification pipeline. However, just doing this can impact simplification of recursive functions since function simplification takes advantage of function attributes of callees (some LLVM tests are actually impacted by this), so keep a copy of PostOrderFunctionAttrs before the function simplification pipeline that only runs on recursive functions. For example, this fixes the small regression noticed in https://reviews.llvm.org/D128830. This requires some restructuring of the CGSCC NoRerun feature. We need to cache the ShouldNotRunFunctionPassesAnalysis analysis after the simplification is done, which now is after the second PostOrderFunctionAttrs run, rather than after the function simplification pipeline. Compile time impact: https://llvm-compile-time-tracker.com/compare.php?from=33cf40122279342b50f92a3a53f5c185390b6018&to=1bb2a07875634e508a6bdf2ca1b130f55510f060&stat=instructions:u Compile time increase from unconditionally running the first PostOrderFunctionAttrs: https://llvm-compile-time-tracker.com/compare.php?from=1bb2a07875634e508a6bdf2ca1b130f55510f060&to=f4f87e89cc7a35c64e3a103a8036192a84ae002b&stat=instructions:u Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D145210	2023-03-06 09:01:45 -08:00
Arthur Eubanks	bd6eb1423c	[NFC][Pipeline] Move PromotePass into GlobalCleanupPM	2023-03-01 13:22:24 -08:00
Arthur Eubanks	25af6507e7	[PassBuilder] Always enable CountVisitsPass when stats are enabled Rather than having a separate flag. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D145015	2023-03-01 09:22:02 -08:00
Rong Xu	666731660c	[Pass][CHR] Move ControlHeightReduction to module optimization pipeline This is a modified version of commit b374423304a8 by Arthur (https://reviews.llvm.org/D143424). Here we invoke to the pass independent of PGOOPT. We now check if the profile is available through the program summary. This ensures CHR is called in distributed ThinLTO BE compilation (where PGOOPT might not be created). Differential Revision: https://reviews.llvm.org/D144769	2023-02-27 11:47:54 -08:00
Arthur Eubanks	a628ca4925	Revert "[Pipeline] Move ControlHeightReduction to module optimization pipeline" This reverts commit b374423304a8d91d590d0ce5ab1b381296d6dfb2. Causes regressions on some benchmarks.	2023-02-23 10:17:12 -08:00
Arthur Eubanks	b374423304	[Pipeline] Move ControlHeightReduction to module optimization pipeline This pass isn't a simplification, it's a non-canonical optimization. This makes it only run once in a (Thin)LTO pipeline during postlink, just like all the other optimization pipeline passes. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D143424	2023-02-16 15:23:38 -08:00
Arthur Eubanks	5f5cf60298	[Pipeline] Remove -enable-npm-O3-nontrivial-unswitch flag This was added to help debugging performance issues, no longer needed. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98675	2023-02-16 11:36:03 -08:00
Arthur Eubanks	4d16ebd6c9	[Pipeline] Remove -enable-no-rerun-simplification-pipeline flag This has been on without complaint for a while. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D144130	2023-02-16 11:29:51 -08:00
David Green	86bfeb906e	Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner." This seems to cause large regressions in existing code, as much as 75% slower (4x the time taken). Small always inline functions seem to be used a lot in the cmsis-dsp library. I would add a phase ordering test to show the problems, but one already exists! The llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll was just changed by removing alwaysinline to hide the problems that existed. This reverts commit cae033dcf227aeecf58fca5af6fc7fde1fd2fb4f. This reverts commit 8e33c41e72ad42e4c27f8cbc3ad2e02b169637a1.	2023-02-10 15:01:49 +00:00
Amara Emerson	8e33c41e72	Inliner: Address missed review comments for D143624	2023-02-09 21:56:40 -08:00
Amara Emerson	cae033dcf2	Inlining: Run the legacy AlwaysInliner before the regular inliner. We have several situations where it's beneficial for code size to ensure that every call to always-inline functions are inlined before normal inlining decisions are made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this, it only does it on a per-SCC basis, rather than the whole module. Ensuring that all mandatory inlinings are done before any heuristic based decisions are made just makes sense. Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0. This also fixes an exponential compile time blow up in https://github.com/llvm/llvm-project/issues/59126 Differential Revision: https://reviews.llvm.org/D143624	2023-02-09 16:49:29 -08:00
Florian Hahn	8028263c41	Recommit "[ConstraintElim] Enable pass by default." This reverts commit 695ce48c63ec582a46bfbda9b066f4d3bcde143f. The compile-time regression causing the revert has been fixed. Recommit the original patch. Original commit message: The pass should help to close a functional gap when it comes to reasoning about related conditions in a relatively general way. It addresses multiple existing issues (linked below) and the need for a more powerful reasoning system was also discussed recently in https://discourse.llvm.org/t/rfc-alternative-approach-of-dealing-with-implications-from-comparisons-through-pos-analysis/65601/7 On AArch64, the new pass performs ~2000 simplifications on MultiSource,SPEC2006,SPEC2017 with -O3. Compile-time impact: NewPM-O3: +0.20% NewPM-ReleaseThinLTO: +0.32% NewPM-ReleaseLTO-g: +0.28% https://llvm-compile-time-tracker.com/compare.php?from=f01a3a893c147c1594b9a3fbd817456b209dabbf&to=577688758ef64fb044215ec3e497ea901bb2db28&stat=instructions:u Fixes #49344. Fixes #47888. Fixes #48253. Fixes #49229. Fixes #58074. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135915	2023-02-06 18:09:43 +00:00
Steven Wu	516e301752	[NFC][Profile] Access profile through VirtualFileSystem Make the access to profile data going through virtual file system so the inputs can be remapped. In the context of the caching, it can make sure we capture the inputs and provided an immutable input as profile data. Reviewed By: akyrtzi, benlangmuir Differential Revision: https://reviews.llvm.org/D139052	2023-02-01 09:25:02 -08:00
Arthur Eubanks	4ce34bb2a9	[CGSCC] Add pass which counts the max number of times we visit a function This will help with finding potential pathological CGSCC cases. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D142853	2023-01-30 10:06:53 -08:00
Joseph Huber	6185246f4f	[OpenMP] Run an extra 'OpenMPOpt' pass in LTO-mode The `OpenMPOpt` pass is pivotal to the performance of many OpenMP offloading programs. When we perform non-LTO builds with OpenMP we used to link the OpenMP deviceRTL individually for each TU. This lead to us getting an additional attributor run on the combined runtime and user code. When we used LTO we lost a run and suffered a large performance degradation. This patch simply adds in the extra `OpenMPOpt` pass that we miss into the LTO pipeline. This patch fixes the performance regression shown in applications that used OpenMP offloading in LTO mode. Previously, this wasn't legal to do as we could emit new runtime calls into the module. That was fixed by D142646. Depends on D142646 Fixes https://github.com/llvm/llvm-project/issues/60300 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142650	2023-01-26 13:23:45 -06:00
Joseph Huber	0bdde9dfb9	[OpenMP] Make OpenMPOpt aware of the OpenMP runtime's status The `OpenMPOpt` pass contains optimizations that generate new calls into the OpenMP runtime. This causes problems if we are in a state where the runtime has already been linked statically. Generating these new calls will result in them never being resolved. We should indicate if we are in a "post-link" LTO phase and prevent OpenMPOpt from generating new runtime calls. Generally, it's not desireable for passes to maintain state about the context in which they're called. But this is the only reasonable solution to static linking when we have a pass that generates new runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142646	2023-01-26 13:23:44 -06:00
Florian Hahn	695ce48c63	Revert "[ConstraintElim] Enable pass by default." This reverts commit fb13dcf3431cd83911fe56899d2fade808dc5b8d. A large compile-time regression for code generated by sanitizers has been reported. Revert while I investigate the issue. Details and reproducers are available here: https://reviews.llvm.org/D135915	2023-01-18 14:25:00 +00:00
Alexandros Lamprineas	572a757fa7	[IPSCCP] Enable specialization of functions. Re-enable the optimization after having fixed the compilation error found in SPEC/CINT2017rate/502.gcc_r when both LTO and PGO are in use (see https://reviews.llvm.org/D141474). Differential Revision: https://reviews.llvm.org/D140210	2023-01-13 14:04:17 +00:00
Florian Hahn	fb13dcf343	[ConstraintElim] Enable pass by default. The pass should help to close a functional gap when it comes to reasoning about related conditions in a relatively general way. It addresses multiple existing issues (linked below) and the need for a more powerful reasoning system was also discussed recently in https://discourse.llvm.org/t/rfc-alternative-approach-of-dealing-with-implications-from-comparisons-through-pos-analysis/65601/7 On AArch64, the new pass performs ~2000 simplifications on MultiSource,SPEC2006,SPEC2017 with -O3. Compile-time impact: NewPM-O3: +0.20% NewPM-ReleaseThinLTO: +0.32% NewPM-ReleaseLTO-g: +0.28% https://llvm-compile-time-tracker.com/compare.php?from=f01a3a893c147c1594b9a3fbd817456b209dabbf&to=577688758ef64fb044215ec3e497ea901bb2db28&stat=instructions:u Fixes #49344. Fixes #47888. Fixes #48253. Fixes #49229. Fixes #58074. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135915	2023-01-04 18:00:37 +00:00
Florian Hahn	f3c1d92682	[ConstraintElim] Adjust position in LTO pipeline. This runs ConstraintElim earlier during LTO, similar to non-LTO. Discussed and split off from D135915.	2023-01-03 17:07:43 +00:00
Florian Hahn	9e6d2c82d6	[ConstraintElim] Move after first instcombine run. Running ConstraintEliminiation after the first InstCombine run results in slightly more simplifications on average. There are is a tiny number of regressions, mostly due to CVP eliminating a condition that ConstraintElimination would use, but in most cases there's a slight improvement or no change. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D140853	2023-01-03 13:25:00 +00:00
Florian Hahn	60359f56aa	Revert "[IPSCCP] Enable specialization of functions." This reverts commit 2656572d485127cc30b8fe9752024d2a0f1c50db. It looks like CINT2017rate/502.gcc_r gets mis-compiled with LTO + PGO on AArch64 with function specialization.	2022-12-26 16:02:59 +00:00
Alexandros Lamprineas	2656572d48	[IPSCCP] Enable specialization of functions. This patch enables Function Specialization by default at all optimization levels except Os, Oz. Compilation Time Overhead: -------------------------- Measured the Instruction Count increase (Geomean) for CTMark from the llvm-testsuite as in https://llvm-compile-time-tracker.com. * {-O3, Non-LTO}: +0.136% Instruction Count * {-O3, LTO}: +0.346% Instruction Count Performance Uplift: ------------------- Measured +9.121% score increase for 505.mcf_r from SPEC Int 2017 (Tested on Neoverse N1 with -O3 + LTO) Correctness Testing: -------------------- * Passes bootstrap Clang with ASAN + LTO + FuncSpec aggressive options: { MaxClonesThreshold=10, SmallFunctionThreshold=10, AvgLoopIterationCount=30, SpecializeOnAddresses=true, EnableSpecializationForLiteralConstant=true, FuncSpecializationMaxIters=10 } * Builds Chromium and passes its unittests with the above options + ThinLTO. For more info please refer to https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518 Differential Revision: https://reviews.llvm.org/D140210	2022-12-25 10:05:21 +02:00
Alexandros Lamprineas	8136a0172b	[FuncSpec] Make the Function Specializer part of the IPSCCP pass. Reland 877a9f9abec61f06e39f1cd872e37b828139c2d1 since D138654 (parent) has been fixed with 9ebaf4fef4aac89d4eff08e48185d61bc893f14e and with 8f1e11c5a7d70f96943a72649daa69f152d73e90. Differential Revision: https://reviews.llvm.org/D126455	2022-12-10 14:39:49 +00:00
Roman Lebedev	4f7e5d2206	[SROA] For non-speculatable `load`s of `select`s -- split block, insert then/else blocks, form two-entry PHI node, take 2 Currently, SROA is CFG-preserving. Not doing so does not affect any pipeline test. (???) Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call. By design, we can't really SROA alloca if their address escapes somehow, but we have logic to deal with `load` of `select`/`PHI`, where at least one of the possible addresses prevents promotion, by speculating the `load`s and `select`ing between loaded values. As one would expect, that requires ensuring that the speculation is actually legal. Even ignoring complexity bailouts, that logic does not deal with everything, e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`. There can also be cases where the load is genuinely non-speculate. So if we can't prove that the load can be speculated, unfold the select, produce two-entry phi node, and perform predicated load. Now, that transformation must obviously update Dominator Tree, since we require it later on. Doing so is trivial. Additionally, we don't want to do this for the final SROA invocation (D136806). In the end, this ends up having negative (!) compile-time cost: https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u Though indeed, this only deals with `select`s, `PHI`s are still using speculation. Should we update some more analysis? Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138238 This reverts commit 739611870d3b06605afe25cc07833f6a62de9545, and recommits 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5 with a fixed assertion - we should check that DTU is there, not just assert false...	2022-12-08 20:19:55 +03:00
Roman Lebedev	739611870d	Revert "[SROA] For non-speculatable `load`s of `select`s -- split block, insert then/else blocks, form two-entry PHI node" The assertion about not modifying the CFG seems to not hold, will recommit in a bit. https://lab.llvm.org/buildbot#builders/139/builds/32412 This reverts commit 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5. This reverts commit 4f90f4ada33718f9025d0870a4fe3fe88276b3da.	2022-12-08 19:51:15 +03:00
Roman Lebedev	03e6d9d9d1	[SROA] For non-speculatable `load`s of `select`s -- split block, insert then/else blocks, form two-entry PHI node Currently, SROA is CFG-preserving. Not doing so does not affect any pipeline test. (???) Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call. By design, we can't really SROA alloca if their address escapes somehow, but we have logic to deal with `load` of `select`/`PHI`, where at least one of the possible addresses prevents promotion, by speculating the `load`s and `select`ing between loaded values. As one would expect, that requires ensuring that the speculation is actually legal. Even ignoring complexity bailouts, that logic does not deal with everything, e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`. There can also be cases where the load is genuinely non-speculate. So if we can't prove that the load can be speculated, unfold the select, produce two-entry phi node, and perform predicated load. Now, that transformation must obviously update Dominator Tree, since we require it later on. Doing so is trivial. Additionally, we don't want to do this for the final SROA invocation (D136806). In the end, this ends up having negative (!) compile-time cost: https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u Though indeed, this only deals with `select`s, `PHI`s are still using speculation. Should we update some more analysis? Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138238	2022-12-08 16:51:32 +03:00
Alexandros Lamprineas	0f0cb92cb2	Revert "[FuncSpec] Make the Function Specializer part of the IPSCCP pass." This reverts commit 877a9f9abec61f06e39f1cd872e37b828139c2d1. It depends on the parent revision 42c2dc401742266da3e0251b6c1ca491f4779963 which needs to be reverted as it broke some buildbots, so reverting both.	2022-12-08 12:41:43 +00:00
Alexandros Lamprineas	877a9f9abe	[FuncSpec] Make the Function Specializer part of the IPSCCP pass. The aim of this patch is to minimize the compilation time overhead of running Function Specialization. It is about 40% slower to run as a standalone pass (IPSCCP + FuncSpec vs IPSCCP with FuncSpec) according to my measurements. I compiled the llvm testsuite with NewPM-O3 + LTO and measured single threaded [user + system] time of IPSCCP and FuncSpec by passing the '-time-passes' option to lld. Then I compared the two configurations in terms of Instruction Count of the total compilation (not of the individual passes) as in https://llvm-compile-time-tracker.com. Geomean for non-LTO builds is -0.25% and LTO is -0.5% approximately. You can find more info below: https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518 Differential Revision: https://reviews.llvm.org/D126455	2022-12-08 12:14:27 +00:00
Sjoerd Meijer	8250180238	Revert "Recommit "[LoopFlatten] Enable it by default"" This reverts commit 3ea6a9a469fde168c527b1c34c09f6d684ec86af because of the reported miscompilation in: https://github.com/llvm/llvm-project/issues/59339	2022-12-05 15:14:12 +00:00
Sjoerd Meijer	3ea6a9a469	Recommit "[LoopFlatten] Enable it by default" The problem in 58441 that was reported after enabling this last time was fixed in 8e9e22f07bcbe2ee95478684cf31948370e4e51e.	2022-11-29 10:45:13 +00:00
Rong Xu	6327d263f5	[CHR] Add a threshold for the code duplication ControlHeightReduction (CHR) clones the code region to reduce the branches in the hot code path. The number of clones is linear to the depth of the region. Currently it does not have control over the code size increase. We are seeing one ~9000 BB functions get expanded to ~250000 BBs, an 25x increase. This creates a big compile time issue for the downstream optimizations. This patch adds a cap for number of clones for one region. Differential Revision: https://reviews.llvm.org/D138333	2022-11-22 11:36:40 -08:00
Sanjay Patel	163bb6d64e	[Passes][VectorCombine] enable early run generally and try load folds An early run of VectorCombine was added with D102496 specifically to deal with unnecessary vector ops produced with the C matrix extension. This patch is proposing to try those folds in general and add a pair of load folds to the menu. The load transform will partly solve (see PhaseOrdering diffs) a longstanding vectorization perf bug by removing redundant loads via GVN: issue #17113 The main reason for not enabling the extra pass generally in the initial patch was compile-time cost. The cost of VectorCombine was significantly (surprisingly) improved with: 87debdadaf18 https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u ...so the extra run is going to cost very little now - the total cost of the 2 runs should be less than the 1 run before that micro-optimization: https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u It may be possible to reduce the cost slightly more with a few more earlier-exits like that, but it's probably in the noise based on timing experiments. Differential Revision: https://reviews.llvm.org/D138353	2022-11-21 13:57:55 -05:00
Sanjay Patel	8f337f8ffe	[VectorCombine] generalize pass param name for early combines; NFC The option was added with https://reviews.llvm.org/D102496, and currently the name is accurate, but I am hoping to add a load transform that is not a scalarization. See issue #17113.	2022-11-21 13:57:55 -05:00
Roman Lebedev	8adfa29706	[Pipelines] Introduce SROA after (final, run-time) loop unrolling Now that we are done with loop unrolling, be it either by LoopVectorizer, or LoopUnroll passes, some variable-offset GEP's into alloca's could have become constant-offset, thus enabling SROA and alloca promotion, yet we don't capitalize on that, which is surprizing. While it would be good to not introduce one more SROA invocation, but instead move the one from `PassBuilder::buildFunctionSimplificationPipeline()`, the existing test coverage says that is a bad idea, though it would be fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=b150d34c47efbd8fa09604bce805c0920360f8d7&to=5a9a5c855158b482552be8c7af3e73d67fa44805&stat=instructions So instead, i add yet another SROA run. I have checked, and it needs to be at least after said final loop unrolling. This is still fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=70324cd88328c0924e605fa81b696572560aa5c9&to=fb489bbef687ad821c3173a931709f9cad9aee8a&stat=instructions I've encountered this in a real code, `SROA-after-final-loop-unrolling.ll` has been reduced from https://godbolt.org/z/fsdMhETh3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D136806	2022-11-17 21:31:30 +03:00
Arthur Eubanks	cbcf123af2	[LegacyPM] Remove cl::opts controlling optimization pass manager passes Move these to the new PM if they're used there. Part of removing the legacy pass manager for optimization pipeline. Reland with UseNewGVN usage in clang removed. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D137915	2022-11-14 09:38:17 -08:00
Arthur Eubanks	d7c1427953	Revert "[LegacyPM] Remove cl::opts controlling optimization pass manager passes" This reverts commit 7ec05fec7115a910b2e172de794adc462388c25e. Breaks bots, e.g. https://lab.llvm.org/buildbot#builders/217/builds/15008	2022-11-14 09:33:38 -08:00
Arthur Eubanks	7ec05fec71	[LegacyPM] Remove cl::opts controlling optimization pass manager passes Move these to the new PM if they're used there. Part of removing the legacy pass manager for optimization pipeline. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D137915	2022-11-14 09:23:17 -08:00

1 2 3 4

154 Commits