llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-06 12:36:07 +00:00

Author	SHA1	Message	Date
Ellis Hoag	b8d6659bff	[CodeLayout] Do not flip branch condition when using optsize (#114607 ) * Do not use profile data when flipping a branch condition when optimizing for size. This should improving outlining and ICF due to more uniform instruction sequences. * Refactor `optimizeBranches()` to use early `continue`s * Use the correct debug location for `insertBranch()`	2024-11-12 09:50:29 -08:00
Ellis Hoag	6ab26eab4f	Check hasOptSize() in shouldOptimizeForSize() (#112626 )	2024-10-28 09:45:03 -07:00
Ellis Hoag	cb5fbd2f60	[CodeLayout] Do not verify after assigning blocks (#111754 ) Rather than invariantly running `F->verify()` when asserts are enabled, run machine IR verification in LIT tests only. Swap `CHECK-PERF` and `CHECK-SIZE` in `code_placement_ext_tsp_large.ll`. Remove `={0,1,true,false}` from flags in tests.	2024-10-10 09:01:50 -07:00
spupyrev	9016f27c42	[CodeLayout] Size-aware machine block placement (#109711 ) This is an implementation of a new "size-aware" machine block placement. The idea is to reorder blocks so that the number of fall-through jumps is maximized. Observe that profile data is ignored for the optimization, and it is applied only for instances with hasOptSize()=true. This strategy has two benefits: (i) it eliminates jump instructions, which results in smaller text size; (ii) we avoid using profile data while reordering blocks, which yields more "uniform" functions, thus helping ICF and machine outliner/merger. For large (mobile) apps, the size benefits of (i) and (ii) are roughly the same, combined providing up to 0.5% uncompressed and up to 1% compressed savings size on top of the current solution. The optimization is turned off by default.	2024-10-02 10:48:08 -07:00
Ellis Hoag	fbec1c2a08	[NFC][CodeLayout] Remove unused parameter (#110145 ) The `NodeCounts` parameter of `calcExtTspScore()` is unused, so remove it. Use `SmallVector` since arrays are expected to be small since they represent MBBs.	2024-09-26 10:28:06 -07:00
Matt Arsenault	71ca9fcb8d	llvm-reduce: Don't print verifier failed machine functions (#109673 ) This produces far too much terminal output, particularly for the instruction reduction. Since it doesn't consider the liveness of of the instructions it's deleting, it produces quite a lot of verifier errors.	2024-09-24 22:32:53 +04:00
spupyrev	36dce5091e	[CodeLayout][NFC] Format and minor refactoring of MBP (#109729 ) This PR has two (NFC) commits: - clang-format MBP - move a part of tail duplication and block aligning into helper functions for better readability.	2024-09-24 08:36:57 -07:00
Alexis Engelke	862d822d83	[CodeGen] Don't renumber invalid domtree (#102427 ) Machine block placement might remove nodes from the function but does not update the dominator tree accordingly. Instead of renumbering (which might crash due to accessing removed blocks), set the domtree to null to make clear that it is invalid at this point. Fixup of #102107.	2024-08-08 08:53:45 +02:00
Alexis Engelke	d871b2e0d0	[CodeGen] Use optimized domtree for MachineFunction (#102107 ) The dominator tree gained an optimization to use block numbers instead of a DenseMap to store blocks. Given that machine basic blocks already have numbers, expose these via appropriate GraphTraits. For debugging, block number epochs are added to MachineFunction -- this greatly helps in finding uses of block numbers after RenumberBlocks(). In a few cases where dominator trees are preserved across renumberings, the dominator tree is updated to use the new numbers.	2024-08-06 13:46:19 +02:00
Krzysztof Pszeniczny	5bae81ba9e	[CodeGen] Add an option to skip extTSP BB placement for huge functions. (#99310 ) The extTSP-based basic block layout algorithm improves the performance of the generated code, but unfortunately it has a super-linear time complexity. This leads to extremely long compilation times for certain relatively rare kinds of autogenerated code. This patch adds an `-mllvm` flag to optionally restrict extTSP only to functions smaller than a specified threshold. While commit bcdc0477319a26fd8dcdde5ace3bdd6743599f44 added a knob to to limit the maximum chain size, it's still possible that for certain huge functions the number of chains is very large, leading to a quadratic behaviour in ExtTSPImpl::mergeChainPairs.	2024-07-24 17:43:26 +02:00
John Brawn	f0bd705c9b	[CodeGen] Restore MachineBlockPlacement block ordering (#99351 ) PR #91843 changed the algorithm used to find the next unplaced block so that it iterates through the blocks in BlockFilter instead of iterating through the blocks in the function and checking if they are in the block filter. Unfortunately this sometimes results in a different block ordering being chosen, as the order of blocks in BlockFilter comes from the order in MachineLoopInfo, and in some cases this differs from the order they are in the function. This can also give an end result that has worse performance. Fix this by making collectLoopBlockSet place blocks in its output in the order that they are in the function.	2024-07-24 10:49:50 +01:00
paperchalice	099899961c	[CodeGen][NewPM] Port `machine-block-freq` to new pass manager (#98317 ) - Add `MachineBlockFrequencyAnalysis`. - Add `MachineBlockFrequencyPrinterPass`. - Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager. - `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new pass manager migration.	2024-07-12 15:45:01 +08:00
paperchalice	79d0de2ac3	[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793 ) - Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.	2024-07-09 09:11:18 +08:00
paperchalice	d38b518e04	Reapply "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" (#96858 ) (#96869 ) This reverts commit ab58b6d58edf6a7c8881044fc716ca435d7a0156. In `CodeGen/Generic/MachineBranchProb.ll`, `llc` crashed with dumped MIR when targeting PowerPC. Move test to `llc/new-pm`, which is X86 specific.	2024-06-28 10:59:23 +08:00
paperchalice	ab58b6d58e	Revert "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" (#96858 ) Reverts llvm/llvm-project#96389 Some ppc bots failed.	2024-06-27 15:00:17 +08:00
paperchalice	73e46c2bb4	[CodeGen][NewPM] Port machine-branch-prob to new pass manager (#96389 ) Like IR version `print<branch-prob>`, there is also a `print<machine-branch-prob>`.	2024-06-27 14:04:51 +08:00
William Junda Huang	75882ed4c7	[Codegen] (NFC) Faster algorithm for MachineBlockPlacement (#91843 ) In MachineBlockPlacement, the function getFirstUnplacedBlock is inefficient because in most cases (for usual loop CFG), this function fails to find a candidate, and its complexity becomes O(#(loops in function) * #(blocks in function)). This makes the compilation of very long functions slow. This update reduces it to O(k * #(blocks in function)) where k is the maximum loop nesting depth, by iterating through the BlockFilter instead.	2024-06-13 22:13:38 -04:00
paperchalice	4b24c2dfb5	[CodeGen][NewPM] Split `MachinePostDominators` into a concrete analysis result (#95113 ) `MachinePostDominators` version of #94571.	2024-06-12 14:29:22 +08:00
Kazu Hirata	026a29e8b3	[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of StringRef::equals (NFC) (#91304 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 53 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-07 10:20:10 -07:00
Kazu Hirata	7d269a4841	[CodeGen] Use range-based for loops (NFC)	2024-02-03 21:43:01 -08:00
Freddy Ye	d102f8bda1	[MachineBlockPlacement][X86] Use max of MDAlign and TLIAlign to align Loops. (#71026 ) This patch added backend consumption on a new loop metadata: !1 = !{!"llvm.loop.align", i32 64} which is generated from clang's new loop attribute: [[clang::code_align()]] clang patch: #70762	2023-11-21 14:06:32 +08:00
Kazu Hirata	f9306f6de3	[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156 ) C++20 comes with std::erase to erase a value from std::vector. This patch renames llvm::erase_value to llvm::erase for consistency with C++20. We could make llvm::erase more similar to std::erase by having it return the number of elements removed, but I'm not doing that for now because nobody seems to care about that in our code base. Since there are only 50 occurrences of erase_value in our code base, this patch replaces all of them with llvm::erase and deprecates llvm::erase_value.	2023-10-24 23:03:13 -07:00
Matthias Braun	2e26d09106	BlockFrequencyInfo: Add PrintBlockFreq helper (#67512 ) - Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions into a `PrintBlockFreq()` function returning a `Printable` object. This simplifies usage as it can be directly piped to a `raw_ostream` like `dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`. - Previously there was an interesting behavior where `BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number and as an `uint64_t`. Most algorithms use the `BlockFrequency` abstraction with the integers, the print function for basic blocks printed the `Scaled64` number potentially showing higher accuracy than was used by the algorithm. This changes things to only print `BlockFrequency` values. - Replace some instances of `dbgs() << Freq.getFrequency()` with the new function.	2023-10-05 18:26:50 -07:00
Matthias Braun	5181156b37	Use BlockFrequency type in more places (NFC) (#68266 ) The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it more consistently in various APIs and disable implicit conversion to make usage more consistent and explicit. - Use `BlockFrequency Freq` parameter for `setBlockFreq`, `getProfileCountFromFreq` and `setBlockFreqAndScale` functions. - Return `BlockFrequency` in `getEntryFreq()` functions. - While on it change some `const BlockFrequency& Freq` parameters to plain `BlockFreqency Freq`. - Mark `BlockFrequency(uint64_t)` constructor as explicit. - Add missing `BlockFrequency::operator!=`. - Remove `uint64_t BlockFreqency::getMaxFrequency()`. - Add `BlockFrequency BlockFrequency::max()` function.	2023-10-05 11:40:17 -07:00
Fangrui Song	6b8d04c23d	[CodeLayout] Refactor std::vector uses, namespace, and EdgeCountT. NFC * Place types and functions in the llvm::codelayout namespace * Change EdgeCountT from pair<pair<uint64_t, uint64_t>, uint64_t> to a struct and utilize structured bindings. It is not conventional to use the "T" suffix for structure types. * Remove a redundant copy in ChainT::merge. * Change {ExtTSPImpl,CDSortImpl}::run to use return value instead of an output parameter * Rename applyCDSLayout to computeCacheDirectedLayout: (a) avoid rare abbreviation "CDS" (cache-directed sort) (b) "compute" is more conventional for the specific use case * Change the parameter types from std::vector to ArrayRef so that SmallVector arguments can be used. * Similarly, rename applyExtTspLayout to computeExtTspLayout. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D159526	2023-09-21 13:13:03 -07:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Guozhi Wei	1bcb6a3da2	[MBP] Enable duplicating return block to remove jump to return Sometimes LLVM generates branch to return instruction, like PR63227. It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds we avoid duplicating a BB into another already placed BB to prevent destroying computed layout. But if the successor BB is a return block, duplicating it will only reduce taken branches without hurt to any other branches. Differential Revision: https://reviews.llvm.org/D153093	2023-06-21 18:54:31 +00:00
Akshay Khadse	43b38696aa	Fix uninitialized class members Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148692	2023-04-20 11:18:34 +08:00
Akshay Khadse	8bf7f86d79	Fix uninitialized pointer members in CodeGen This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148303	2023-04-17 16:32:46 +08:00
Kazu Hirata	a585fa2637	[CodeGen] Use *{Set,Map}::contains (NFC)	2023-03-14 08:07:42 -07:00
Fangrui Song	1e6921131a	Move global namespace cl::opt inside llvm::	2023-02-14 00:09:44 -08:00
Mingming Liu	36e8e19337	[NFC][BlockPlacement]Add an option to renumber blocks based on function layout order. Use case: - When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwhile blocks could be numbered in a different order. - As a result, it's hard to map between the graph and pass output. With this option on, the basic blocks are renumbered in function layout order. This option is only useful when a function is to be visualized (i.e., when view options are on) to make it debugging only. Use https://godbolt.org/z/5WTW36bMr as an example: - As MBP pass output (shown in godbolt output window), `func2` is in a basic block numbered `2` (`bb.2`), and `func1` is in a basic block numbered `3` (`bb.3`); `bb.3` is a block with higher block frequency than `bb.2`, and `bb.3` is placed before `bb.2` in the functin layout. - Use [1] to get the dot graph (graph uploaded in [2]), the blocks are re-numbered. - `func1` is in 'if.end' block, and labeled `1` in visualized dot; `func2` is in 'if.then' blocks, and labeled `3` --> the labeled number and bb number won't map. - [[ `b5626ae975/llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp (L127)` \| DOTGraphTraits<MachineBlockFrequencyInfo *>::getNodeLabel ]] is where labeled numbers are based on function layout number, and [[ `a8d93783f3/llvm/include/llvm/Support/GraphWriter.h (L209)` \| called by graph writer ]]. So call 'MachineFunction::RenumberBlocks' would make labeled number (in dot graph) and block number (in pass output) consistent with each other. [1] `./bin/clang++ -O3 -S -mllvm -view-block-layout-with-bfi=count -mllvm -view-bfi-func-name=_Z9func_loopv -mllvm -print-after=block-placement -mllvm -filter-print-funcs=_Z9func_loopv test.c` [2] {F25201785} Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D137467	2022-11-07 07:52:45 -08:00
spupyrev	8d5b694da1	extending code layout alg The diff modifies ext-tsp code layout algorithm in the following ways: (i) fixes merging of cold block chains (this is a port of D129397); (ii) adjusts the cost model utilized for optimization; (iii) adjusts some APIs so that the implementation can be used in BOLT; this is a prerequisite for D129895. The only non-trivial change is (ii). Here we introduce different weights for conditional and unconditional branches in the cost model. Based on the new model it is slightly more important to increase the number of "fall-through unconditional" jumps, which makes sense, as placing two blocks with an unconditional jump next to each other reduces the number of jump instructions in the generated code. Experimentally, this makes a mild impact on the performance; I've seen up to 0.2%-0.3% perf win on some benchmarks. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D129893	2022-08-24 09:40:25 -07:00
Kazu Hirata	9e6d1f4b5d	[CodeGen] Qualify auto variables in for loops (NFC)	2022-07-17 01:33:28 -07:00
Mingming Liu	1e67385d28	[MachineBlockPlacementStats] Added check for "-filter-print-funcs" option to the machine-block-placement-stats. Differential Revision: https://reviews.llvm.org/D128019	2022-06-16 21:59:54 -07:00
Mingming Liu	b7d09557f6	Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats." This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to add differential revision link to commit message and re-commit.	2022-06-16 21:56:08 -07:00
Mingming Liu	46d45df451	[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats.	2022-06-16 21:48:08 -07:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
spupyrev	bcdc047731	speeding up ext-tsp for huge instances Differential Revision: https://reviews.llvm.org/D120780	2022-03-02 07:17:48 -08:00
Hongtao Yu	dee058c670	[CSSPGO] Turn on ext-tsp by default for CSSPGO. I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119048	2022-02-04 19:46:44 -08:00
Nicholas Guy	73d92faa2f	[CodeGen] Emit alignment "Max Skip" operand The current AsmPrinter has support to emit the "Max Skip" operand (the 3rd of .p2align), however has no support for it to actually be specified. Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a per-block basis. Leaving the value as default (0) causes no observable differences in behaviour. Differential Revision: https://reviews.llvm.org/D114590	2022-01-05 12:54:30 +00:00
spupyrev	f573f6866e	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-07 07:31:10 -08:00
Nico Weber	3678326d28	Revert "ext-tsp basic block layout" This reverts commit c68f71eb37c2b6ffcf29e865d443a910e73083bd. Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424	2021-12-06 19:08:20 -05:00
spupyrev	c68f71eb37	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-06 08:56:39 -08:00
Kazu Hirata	c9fca53af1	[CodeGen, Target] Use pred_empty and succ_empty (NFC)	2021-09-10 11:11:31 -07:00
Guozhi Wei	50b6273145	[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header Function findBestLoopTopHelper tries to find a new loop top block which can also fall through to OldTop, but it's impossible if OldTop is not a chain header, so it should exit immediately. Differential Revision: https://reviews.llvm.org/D106329	2021-07-28 19:00:45 -07:00
Fangrui Song	d8aba75a76	Internalize some cl::opt global variables or move them under namespace llvm	2021-05-07 11:15:43 -07:00
Nicholas Guy	cd880442ae	[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold Different targets might handle branch performance differently, so this patch allows for targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch can be and still be duplicated to generate straight-line code instead. This patch also specifies said override values for the AArch64 subtarget. Differential Revision: https://reviews.llvm.org/D95631	2021-02-08 13:28:00 +00:00

1 2 3 4 5 ...

328 Commits