llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-06 11:06:07 +00:00

Author	SHA1	Message	Date
Kazu Hirata	026a29e8b3	[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of StringRef::equals (NFC) (#91304 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 53 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-07 10:20:10 -07:00
Kazu Hirata	7d269a4841	[CodeGen] Use range-based for loops (NFC)	2024-02-03 21:43:01 -08:00
Freddy Ye	d102f8bda1	[MachineBlockPlacement][X86] Use max of MDAlign and TLIAlign to align Loops. (#71026 ) This patch added backend consumption on a new loop metadata: !1 = !{!"llvm.loop.align", i32 64} which is generated from clang's new loop attribute: [[clang::code_align()]] clang patch: #70762	2023-11-21 14:06:32 +08:00
Kazu Hirata	f9306f6de3	[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156 ) C++20 comes with std::erase to erase a value from std::vector. This patch renames llvm::erase_value to llvm::erase for consistency with C++20. We could make llvm::erase more similar to std::erase by having it return the number of elements removed, but I'm not doing that for now because nobody seems to care about that in our code base. Since there are only 50 occurrences of erase_value in our code base, this patch replaces all of them with llvm::erase and deprecates llvm::erase_value.	2023-10-24 23:03:13 -07:00
Matthias Braun	2e26d09106	BlockFrequencyInfo: Add PrintBlockFreq helper (#67512 ) - Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions into a `PrintBlockFreq()` function returning a `Printable` object. This simplifies usage as it can be directly piped to a `raw_ostream` like `dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`. - Previously there was an interesting behavior where `BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number and as an `uint64_t`. Most algorithms use the `BlockFrequency` abstraction with the integers, the print function for basic blocks printed the `Scaled64` number potentially showing higher accuracy than was used by the algorithm. This changes things to only print `BlockFrequency` values. - Replace some instances of `dbgs() << Freq.getFrequency()` with the new function.	2023-10-05 18:26:50 -07:00
Matthias Braun	5181156b37	Use BlockFrequency type in more places (NFC) (#68266 ) The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it more consistently in various APIs and disable implicit conversion to make usage more consistent and explicit. - Use `BlockFrequency Freq` parameter for `setBlockFreq`, `getProfileCountFromFreq` and `setBlockFreqAndScale` functions. - Return `BlockFrequency` in `getEntryFreq()` functions. - While on it change some `const BlockFrequency& Freq` parameters to plain `BlockFreqency Freq`. - Mark `BlockFrequency(uint64_t)` constructor as explicit. - Add missing `BlockFrequency::operator!=`. - Remove `uint64_t BlockFreqency::getMaxFrequency()`. - Add `BlockFrequency BlockFrequency::max()` function.	2023-10-05 11:40:17 -07:00
Fangrui Song	6b8d04c23d	[CodeLayout] Refactor std::vector uses, namespace, and EdgeCountT. NFC * Place types and functions in the llvm::codelayout namespace * Change EdgeCountT from pair<pair<uint64_t, uint64_t>, uint64_t> to a struct and utilize structured bindings. It is not conventional to use the "T" suffix for structure types. * Remove a redundant copy in ChainT::merge. * Change {ExtTSPImpl,CDSortImpl}::run to use return value instead of an output parameter * Rename applyCDSLayout to computeCacheDirectedLayout: (a) avoid rare abbreviation "CDS" (cache-directed sort) (b) "compute" is more conventional for the specific use case * Change the parameter types from std::vector to ArrayRef so that SmallVector arguments can be used. * Similarly, rename applyExtTspLayout to computeExtTspLayout. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D159526	2023-09-21 13:13:03 -07:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Guozhi Wei	1bcb6a3da2	[MBP] Enable duplicating return block to remove jump to return Sometimes LLVM generates branch to return instruction, like PR63227. It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds we avoid duplicating a BB into another already placed BB to prevent destroying computed layout. But if the successor BB is a return block, duplicating it will only reduce taken branches without hurt to any other branches. Differential Revision: https://reviews.llvm.org/D153093	2023-06-21 18:54:31 +00:00
Akshay Khadse	43b38696aa	Fix uninitialized class members Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148692	2023-04-20 11:18:34 +08:00
Akshay Khadse	8bf7f86d79	Fix uninitialized pointer members in CodeGen This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148303	2023-04-17 16:32:46 +08:00
Kazu Hirata	a585fa2637	[CodeGen] Use *{Set,Map}::contains (NFC)	2023-03-14 08:07:42 -07:00
Fangrui Song	1e6921131a	Move global namespace cl::opt inside llvm::	2023-02-14 00:09:44 -08:00
Mingming Liu	36e8e19337	[NFC][BlockPlacement]Add an option to renumber blocks based on function layout order. Use case: - When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwhile blocks could be numbered in a different order. - As a result, it's hard to map between the graph and pass output. With this option on, the basic blocks are renumbered in function layout order. This option is only useful when a function is to be visualized (i.e., when view options are on) to make it debugging only. Use https://godbolt.org/z/5WTW36bMr as an example: - As MBP pass output (shown in godbolt output window), `func2` is in a basic block numbered `2` (`bb.2`), and `func1` is in a basic block numbered `3` (`bb.3`); `bb.3` is a block with higher block frequency than `bb.2`, and `bb.3` is placed before `bb.2` in the functin layout. - Use [1] to get the dot graph (graph uploaded in [2]), the blocks are re-numbered. - `func1` is in 'if.end' block, and labeled `1` in visualized dot; `func2` is in 'if.then' blocks, and labeled `3` --> the labeled number and bb number won't map. - [[ `b5626ae975/llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp (L127)` \| DOTGraphTraits<MachineBlockFrequencyInfo *>::getNodeLabel ]] is where labeled numbers are based on function layout number, and [[ `a8d93783f3/llvm/include/llvm/Support/GraphWriter.h (L209)` \| called by graph writer ]]. So call 'MachineFunction::RenumberBlocks' would make labeled number (in dot graph) and block number (in pass output) consistent with each other. [1] `./bin/clang++ -O3 -S -mllvm -view-block-layout-with-bfi=count -mllvm -view-bfi-func-name=_Z9func_loopv -mllvm -print-after=block-placement -mllvm -filter-print-funcs=_Z9func_loopv test.c` [2] {F25201785} Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D137467	2022-11-07 07:52:45 -08:00
spupyrev	8d5b694da1	extending code layout alg The diff modifies ext-tsp code layout algorithm in the following ways: (i) fixes merging of cold block chains (this is a port of D129397); (ii) adjusts the cost model utilized for optimization; (iii) adjusts some APIs so that the implementation can be used in BOLT; this is a prerequisite for D129895. The only non-trivial change is (ii). Here we introduce different weights for conditional and unconditional branches in the cost model. Based on the new model it is slightly more important to increase the number of "fall-through unconditional" jumps, which makes sense, as placing two blocks with an unconditional jump next to each other reduces the number of jump instructions in the generated code. Experimentally, this makes a mild impact on the performance; I've seen up to 0.2%-0.3% perf win on some benchmarks. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D129893	2022-08-24 09:40:25 -07:00
Kazu Hirata	9e6d1f4b5d	[CodeGen] Qualify auto variables in for loops (NFC)	2022-07-17 01:33:28 -07:00
Mingming Liu	1e67385d28	[MachineBlockPlacementStats] Added check for "-filter-print-funcs" option to the machine-block-placement-stats. Differential Revision: https://reviews.llvm.org/D128019	2022-06-16 21:59:54 -07:00
Mingming Liu	b7d09557f6	Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats." This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to add differential revision link to commit message and re-commit.	2022-06-16 21:56:08 -07:00
Mingming Liu	46d45df451	[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats.	2022-06-16 21:48:08 -07:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
spupyrev	bcdc047731	speeding up ext-tsp for huge instances Differential Revision: https://reviews.llvm.org/D120780	2022-03-02 07:17:48 -08:00
Hongtao Yu	dee058c670	[CSSPGO] Turn on ext-tsp by default for CSSPGO. I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119048	2022-02-04 19:46:44 -08:00
Nicholas Guy	73d92faa2f	[CodeGen] Emit alignment "Max Skip" operand The current AsmPrinter has support to emit the "Max Skip" operand (the 3rd of .p2align), however has no support for it to actually be specified. Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a per-block basis. Leaving the value as default (0) causes no observable differences in behaviour. Differential Revision: https://reviews.llvm.org/D114590	2022-01-05 12:54:30 +00:00
spupyrev	f573f6866e	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-07 07:31:10 -08:00
Nico Weber	3678326d28	Revert "ext-tsp basic block layout" This reverts commit c68f71eb37c2b6ffcf29e865d443a910e73083bd. Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424	2021-12-06 19:08:20 -05:00
spupyrev	c68f71eb37	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-06 08:56:39 -08:00
Kazu Hirata	c9fca53af1	[CodeGen, Target] Use pred_empty and succ_empty (NFC)	2021-09-10 11:11:31 -07:00
Guozhi Wei	50b6273145	[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header Function findBestLoopTopHelper tries to find a new loop top block which can also fall through to OldTop, but it's impossible if OldTop is not a chain header, so it should exit immediately. Differential Revision: https://reviews.llvm.org/D106329	2021-07-28 19:00:45 -07:00
Fangrui Song	d8aba75a76	Internalize some cl::opt global variables or move them under namespace llvm	2021-05-07 11:15:43 -07:00
Nicholas Guy	cd880442ae	[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold Different targets might handle branch performance differently, so this patch allows for targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch can be and still be duplicated to generate straight-line code instead. This patch also specifies said override values for the AArch64 subtarget. Differential Revision: https://reviews.llvm.org/D95631	2021-02-08 13:28:00 +00:00
Kazu Hirata	7bc76fd0ec	[CodeGen] Construct SmallVector with iterator ranges (NFC)	2020-12-31 09:39:11 -08:00
Guozhi Wei	687e80be7f	[MBP] Add whole chain to BlockFilterSet instead of individual BB Currently we add individual BB to BlockFilterSet if its frequency satisfies LoopFreq / Freq <= LoopToColdBlockRatio LoopFreq is edge frequency from outside to loop header. LoopToColdBlockRatio is a command line parameter. It doesn't make sense since we always layout whole chain, not individual BBs. It may also cause a tricky problem. Sometimes it is possible that the LoopFreq of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop, like .cold in the test case. So it is added to the chain of inner loop. When work on the outer loop, .cold is not added to BlockFilterSet, so the edge to successor .problem is not counted in UnscheduledPredecessors of .problem chain. But other blocks in the inner loop are added BlockFilterSet, so the whole inner loop chain can be layout, and markChainSuccessors is called to decrease UnscheduledPredecessors of following chains. markChainSuccessors calls markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold, so .problem chain's UnscheduledPredecessors is decreased, but this edge was not counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors becomes 0 when it still has an unscheduled predecessor .pred! And it causes problems in following various successor BB selection algorithms. Differential Revision: https://reviews.llvm.org/D89088	2020-12-16 15:45:47 -08:00
Guozhi Wei	d50d7c37a1	[MBP] Prevent rotating a chain contains entry block The entry block should always be the first BB in a function. So we should not rotate a chain contains the entry block. Differential Revision: https://reviews.llvm.org/D92882	2020-12-14 12:48:55 -08:00
Kazu Hirata	ee5b5b7a35	[CodeGen] Use llvm::erase_value (NFC)	2020-12-13 20:05:48 -08:00
Kazu Hirata	a553ac9791	[CodeGen] llvm::erase_if (NFC)	2020-12-05 15:44:40 -08:00
Kazu Hirata	68403af007	[MBP] Remove unused declaration shouldPredBlockBeOutlined (NFC) The function was introduced on Jun 12, 2016 in commit 071d0f180794f7819c44026815614ce8fa00a3bd. Its definition was removed on Mar 2, 2017 in commit 1393761e0ca3fe8271245762f78daf4d5208cd77.	2020-11-21 23:35:02 -08:00
Han Shen	e42f6c0ac0	Revert "[MBP] Add whole chain to BlockFilterSet instead of individual BB" This reverts commit adfb5415010fbbc009a4a6298cfda7a6ed4fa6d4. This is reverted because it caused an chrome error: https://crbug.com/1140168	2020-10-22 17:31:01 -07:00
Guozhi Wei	adfb541501	[MBP] Add whole chain to BlockFilterSet instead of individual BB Currently we add individual BB to BlockFilterSet if its frequency satisfies LoopFreq / Freq <= LoopToColdBlockRatio LoopFreq is edge frequency from outside to loop header. LoopToColdBlockRatio is a command line parameter. It doesn't make sense since we always layout whole chain, not individual BBs. It may also cause a tricky problem. Sometimes it is possible that the LoopFreq of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop, like .cold in the test case. So it is added to the chain of inner loop. When work on the outer loop, .cold is not added to BlockFilterSet, so the edge to successor .problem is not counted in UnscheduledPredecessors of .problem chain. But other blocks in the inner loop are added BlockFilterSet, so the whole inner loop chain can be layout, and markChainSuccessors is called to decrease UnscheduledPredecessors of following chains. markChainSuccessors calls markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold, so .problem chain's UnscheduledPredecessors is decreased, but this edge was not counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors becomes 0 when it still has an unscheduled predecessor .pred! And it causes problems in following various successor BB selection algorithms. Differential Revision: https://reviews.llvm.org/D89088	2020-10-14 11:55:10 -07:00
Guozhi Wei	fd75ad8662	[MBFIWrapper] Add a new function getBlockProfileCount MBFIWrapper keeps track of block frequencies of newly created blocks and modified blocks, modified block frequencies should also impact block profile count. This class doesn't provide interface getBlockProfileCount, users can only use the underlying MBFI to query profile count, the underlying MBFI doesn't know the modifications made in MBFIWrapper, so it either provides stale profile count for modified block or simply crashes on new blocks. So this patch add function getBlockProfileCount to class MBFIWrapper to handle new blocks or modified blocks. Differential Revision: https://reviews.llvm.org/D87802	2020-09-23 09:31:45 -07:00
Fangrui Song	6913812abc	Fix some clang-tidy bugprone-argument-comment issues	2020-09-19 20:41:25 -07:00
Guozhi Wei	28759e9fcc	[MBP] Use profile count to compute tail dup cost if it is available Current tail duplication in machine block placement pass uses block frequency information in cost model. But frequency number has only relative meaning compared to other basic blocks in the same function. A large frequency number doesn't mean it is hot and a small frequency number doesn't mean it is cold. To overcome this problem, this patch uses profile count in cost model if it's available. So we can tail duplicate real hot basic blocks. Differential Revision: https://reviews.llvm.org/D83265	2020-07-21 11:18:06 -07:00
Yuanfang Chen	78c69a00a4	[NFC] Clean up uses of MachineModuleInfoWrapperPass	2020-07-01 09:45:05 -07:00
James Y Knight	1978309db1	MachineBasicBlock::updateTerminator now requires an explicit layout successor. Previously, it tried to infer the correct destination block from the successor list, but this is a rather tricky propspect, given the existence of successors that occur mid-block, such as invoke, and potentially in the future, callbr/INLINEASM_BR. (INLINEASM_BR, in particular would be problematic, because its successor blocks are not distinct from "normal" successors, as EHPads are.) Instead, require the caller to pass in the expected fallthrough successor explicitly. In most callers, the correct block is immediately clear. But, in MachineBlockPlacement, we do need to record the original ordering, before starting to reorder blocks. Unfortunately, the goal of decoupling the behavior of end-of-block jumps from the successor list has not been fully accomplished in this patch, as there is currently no other way to determine whether a block is intended to fall-through, or end as unreachable. Further work is needed there. Differential Revision: https://reviews.llvm.org/D79605	2020-06-06 22:30:51 -04:00
Benjamin Kramer	97f92261df	[MBP] tuple->pair. NFC. std::pair has a trivial copy ctor, std::tuple doesn't.	2020-05-02 20:23:34 +02:00
Hongtao Yu	11455a7905	[CodeGen] Allow partial tail duplication in Machine Block Placement. Summary: A count profile may affect tail duplication's heuristic causing a block to be duplicated in only a part of its predecessors. This is not allowed in the Machine Block Placement pass where an assert will go off. I'm removing the assert and making the optimization bail out when such case happens. Reviewers: wenlei, davidxl, Carrot Reviewed By: Carrot Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77748	2020-04-11 12:20:31 -07:00
Hiroshi Yamauchi	c3417592c8	Revert "Include static prof data when collecting loop BBs" This reverts commit 129c911efaa492790c251b3eb18e4db36b55cbc5. Due to an internal benchmark regression.	2020-03-24 09:41:16 -07:00
Bill Wendling	129c911efa	Include static prof data when collecting loop BBs Summary: If the programmer adds static profile data to a branch---i.e. uses "__builtin_expect()" or similar---then we should honor it. Otherwise, "__builtin_expect()" is ignored in crucial situations. So we trust that the programmer knows what they're doing until proven wrong. Subscribers: hiraditya, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74809	2020-02-19 11:33:48 -08:00
Guozhi Wei	369d086d78	[MBP] Partial tail duplication into hot predecessors Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors. This patch improves tail duplication in 3 aspects: A successor can be duplicated into part of its predecessors. A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only. If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors. Differential Revision: https://reviews.llvm.org/D73387	2020-02-12 15:22:33 -08:00

1 2 3 4 5 ...

310 Commits