llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-07 22:56:05 +00:00

Author	SHA1	Message	Date
Kazu Hirata	141574bacb	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#113415 )	2024-10-23 10:44:09 -07:00
Jay Foad	e03f427196	[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133 ) It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all occurrences I could find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor could be deprecated or removed.	2024-09-19 16:16:38 +01:00
Kyungwoo Lee	bf68403484	Attempt to fix [CGData][MachineOutliner] Global Outlining (#90074 ) (#108037 )	2024-09-10 08:21:25 -07:00
Kyungwoo Lee	0f52545289	[CGData][MachineOutliner] Global Outlining (#90074 ) This commit introduces support for outlining functions across modules using codegen data generated from previous codegen. The codegen data currently manages the outlined hash tree, which records outlining instances that occurred locally in the past. The machine outliner now operates in one of three modes: 1. CGDataMode::None: This is the default outliner mode that uses the suffix tree to identify (local) outlining candidates within a module. This mode is also used by (full)LTO to maintain optimal behavior with the combined module. 2. CGDataMode::Write (`-codegen-data-generate`): This mode is identical to the default mode, but it also publishes the stable hash sequences of instructions in the outlined functions into a local outlined hash tree. It then encodes this into the `__llvm_outline` section, which will be dead-stripped at link time. 3. CGDataMode::Read (`-codegen-data-use-path={.cgdata}`): This mode reads a codegen data file (.cgdata) and initializes a global outlined hash tree. This tree is used to generate global outlining candidates. Note that the codegen data file has been post-processed with the raw `__llvm_outline` sections from all native objects using the `llvm-cgdata` tool (or a linker, `LLD`, or a new ThinLTO pipeline later). This depends on https://github.com/llvm/llvm-project/pull/105398. After this PR, LLD (https://github.com/llvm/llvm-project/pull/90166) and Clang (https://github.com/llvm/llvm-project/pull/90304) will follow for each client side support. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.	2024-09-10 06:56:31 -07:00
Simon Tatham	de37da8e37	[MachineOutliner] Preserve instruction bundles (#106402 ) When the machine outliner copies instructions from a source function into an outlined function, it was doing it using `CloneMachineInstr`, which is documented as not preserving the interior of any instruction bundle. So outlining code that includes an instruction bundle would fail, because in the outlined version, the bundle would be empty, so instructions would go missing in the move. This occurs when any bundled instruction appears in the outlined code, so there was no need to construct an unusual test case: I've just copied a function from the existing `stp-opt-with-renaming.mir`, which happens to contain an SVE instruction bundle. Including two identical copies of that function makes the outliner merge them, and then we check that it didn't destroy the interior of the bundle in the process.	2024-09-04 09:06:48 +01:00
Kyungwoo Lee	93b8d07a75	[MachineOutliner][NFC] Refactor (#105398 ) This patch prepares the NFC groundwork for global outlining using CGData, which will follow https://github.com/llvm/llvm-project/pull/90074. - The `MinRepeats` parameter is now explicitly passed to the `getOutliningCandidateInfo` function, rather than relying on a default value of 2. For local outlining, the minimum number of repetitions is typically 2, but for the global outlining (mentioned above), we will optimistically create a single `Candidate` for each `OutlinedFunction` if stable hashes match a specific code sequence. This parameter is adjusted accordingly in global outlining scenarios. - I have also implemented `unique_ptr` for `OutlinedFunction` to ensure safe and efficient memory management within `FunctionList`, avoiding unnecessary implicit copies. This depends on https://github.com/llvm/llvm-project/pull/101461. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.	2024-08-27 14:38:36 -07:00
Matt Arsenault	3cb5604d2c	MachineOutliner: Use PM to query MachineModuleInfo (#99688 ) Avoid getting this from the MachineFunction	2024-07-24 13:22:56 +04:00
Youngsuk Kim	a95c85fba5	[llvm][CodeGen] Avoid 'raw_string_ostream::str' (NFC) (#97318 ) Since `raw_string_ostream` doesn't own the string buffer, it is desirable (in terms of memory safety) for users to directly reference the string buffer rather than use `raw_string_ostream::str()`. Work towards TODO comment to remove `raw_string_ostream::str()`.	2024-07-01 21:52:37 -04:00
Nikita Popov	74deadf196	[IRBuilder] Don't include Module.h (NFC) (#97159 ) This used to be necessary to fetch the DataLayout, but isn't anymore.	2024-06-29 15:05:04 +02:00
Xuan Zhang	d9a00ed366	[MachineOutliner] Leaf Descendants (#90275 ) This PR depends on https://github.com/llvm/llvm-project/pull/90264 In the current implementation, only leaf children of each internal node in the suffix tree are included as candidates for outlining. But all leaf descendants are outlining candidates, which we include in the new implementation. This is enabled on a flag `outliner-leaf-descendants` which is default to be true. The reason for _enabling this on a flag_ is because machine outliner is not the only pass that uses suffix tree. The reason for _having this default to be true_ is because including all leaf descendants show consistent size win. * For Clang/LLD, it shows around 3% reduction in text segment size when compared to the baseline `-Oz` linker binary. * For selected benchmark tests in LLVM test suite \| run (CTMark/) \| only leaf children \| all leaf descendants \| reduction % \| \|------------------\|--------------------\|----------------------\|-------------\| \| lencod \| 349624 \| 348564 \| -0.2004% \| \| SPASS \| 219672 \| 218440 \| -0.4738% \| \| kc \| 271956 \| 250068 \| -0.4506% \| \| sqlite3 \| 223920 \| 222484 \| -0.5471% \| \| 7zip-benchmark \| 405364 \| 401244 \| -0.3428% \| \| bullet \| 139820 \| 138340 \| -0.8315% \| \| consumer-typeset \| 295684 \| 286628 \| -1.2295% \| \| pairlocalalign \| 72236 \| 71936 \| -0.2164% \| \| tramp3d-v4 \| 189572 \| 183676 \| -2.9668% \| This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).	2024-06-18 07:13:05 -07:00
Xuan Zhang	3b16630c26	[MachineOutliner] Sort by Benefit to Cost Ratio (#90264 ) This PR depends on https://github.com/llvm/llvm-project/pull/90260 We changed the order in which functions are outlined in Machine Outliner. The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size. \|run (CTMark/) \|baseline (1)\|priority (2)\|diff (1 -> 2)\| \|----------------\|------------\|------------\|-------------\| \|lencod \|349624 \|349264 \|-0.1030% \| \|SPASS \|219672 \|219480 \|-0.0874% \| \|kc \|271956 \|251200 \|-7.6321% \| \|sqlite3 \|223920 \|223708 \|-0.0947% \| \|7zip-benchmark \|405364 \|402624 \|-0.6759% \| \|bullet \|139820 \|139500 \|-0.2289% \| \|consumer-typeset\|295684 \|290196 \|-1.8560% \| \|pairlocalalign \|72236 \|72092 \|-0.1993% \| \|tramp3d-v4 \|189572 \|189292 \|-0.1477% \| This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).	2024-06-07 06:50:13 -07:00
Xuan Zhang	16c925ab5f	[MachineOutliner] Efficient Implementation of MachineOutliner::findCandidates() (#90260 ) This reduce the time complexity of the main loop of `findCandidates()` method from $O(n^2)$ to $O(n \log n)$. For small $n$, the modification does not regress the build time, but it helps significantly when $n$ is large. For one application, this reduces the runtime of the main loop from 120 seconds to 28 seconds. This is the first commit for an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).	2024-06-03 07:41:49 -07:00
Joshua Cao	ab08df2292	[IR] Do not set `none` for function uwtable (#93387 ) This avoids the pitfall where we set the uwtable to none: ``` func.setUWTableKind(llvm::UWTableKind::None) ``` `Attribute::getAsString()` would see an unknown attribute and fail an assertion. In this patch, we assert that we do not see a None uwtable kind. This also skips the check of `UWTableKind::Async`. It is dominated by the check of `UWTableKind::Default`, which has the same enum value (nfc).	2024-06-02 15:02:11 -07:00
Jay Foad	63a5dc4aed	[CodeGen] Do not pass MF into MachineRegisterInfo methods. NFC. (#84770 ) MachineRegisterInfo already knows the MF so there is no need to pass it in as an argument.	2024-03-11 15:35:05 +00:00
Anatoly Trosinenko	10bd69a4f7	[MachineOutliner] Refactor iterating over Candidate's instructions (#78972 ) Make Candidate's front() and back() functions return references to MachineInstr and introduce begin() and end() returning iterators, the same way it is usually done in other container-like classes. This makes possible to iterate over the instructions contained in Candidate the same way one can iterate over MachineBasicBlock (note that begin() and end() return bundled iterators, just like MachineBasicBlock does, but no instr_begin() and instr_end() are defined yet).	2024-01-23 17:21:40 +03:00
Jessica Paquette	407b4648b8	[MachineOutliner] NFC: Add debug output to MachineOutliner::outline Add some debug output to `outline` to assist in debugging + understanding the code. This will say - How many things we found worth turning into outlined functions - Whether or not candidates were pruned via the outlining algorithm - The function created (if it was created) - Where the calls were inserted - What instruction was used to create the call Sample output below: ``` NUMBER OF POTENTIAL FUNCTIONS: 5 WALKING FUNCTION LIST PRUNED: 0/2 candidates OUTLINE: Expected benefit (12 B) > threshold (1 B) NEW FUNCTION: OUTLINED_FUNCTION_0 CREATE OUTLINED CALLS CALL: OUTLINED_FUNCTION_0 in bar:<unknown> .. BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp CALL: OUTLINED_FUNCTION_0 in bar:<unknown> .. BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp PRUNED: 2/2 candidates SKIP: Expected benefit (0 B) < threshold (1 B) PRUNED: 0/2 candidates OUTLINE: Expected benefit (8 B) > threshold (1 B) NEW FUNCTION: OUTLINED_FUNCTION_1 CREATE OUTLINED CALLS CALL: OUTLINED_FUNCTION_1 in bar:<unknown> .. BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp CALL: OUTLINED_FUNCTION_1 in bar:<unknown> .. BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp PRUNED: 2/2 candidates SKIP: Expected benefit (0 B) < threshold (1 B) PRUNED: 2/2 candidates SKIP: Expected benefit (0 B) < threshold (1 B) ```	2023-05-15 15:29:26 -07:00
wangpc	267708f9d5	[MachineOutliner] Add IsOutlined to MachineFunction We add a field `IsOutlined` to indicate whether a MachineFunction is outlined and set it true for outlined functions in MachineOutliner. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D146191	2023-04-10 10:57:29 +08:00
Nathan Lanza	87c0f67739	[Outliner] Add an option to only enable outlining of patterns above a certain threshold Outlining isn't always a win when the saved instruction count is >= 1. The overhead of representing a new function in the binary depends on exception metadata and alignment. So parameterize this for local tuning. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D136774	2023-04-08 02:12:40 -04:00
Amara Emerson	41e9c4b88c	[NFC][Outliner] Delete default ctors for Candidate & OutlinedFunction. I think it's good practice to avoid having default ctors unless they're really valid/useful. For OutlinedFunction the default ctor was used to represent a bail-out value for getOutliningCandidateInfo(), so I changed the API to return an optional<getOutliningCandidateInfo> instead which seems a tad cleaner. Differential Revision: https://reviews.llvm.org/D146375	2023-03-20 11:17:10 -07:00
Jessica Paquette	92d3672452	[MachineOutliner] Improve mapper statistics Add a test for statistics as well. The mapper size stats were nested in a loop unnecessarily. Move them out. Give existing stats better names, and add one which also tracks the number of sentinels added.	2023-02-03 22:27:21 -08:00
Jessica Paquette	d1359acb9a	[MachineOutliner] NFC: Add debug output to populateMapper Adding debug output to improve outliner debuggability + testability. Move `nooutline` attribute test into the new debug output test.	2023-02-03 22:00:45 -08:00
Jessica Paquette	51fa03200f	[MachineOutliner] NFC: Add debug output to overlap pruning code This had no debug output. Since it was committed as NFC, it had no testcase. The me of today was nerdsniped by the me of 6 years ago and decided that this ought to have a testcase and some debug output.	2023-02-03 17:43:11 -08:00
Jessica Paquette	fe35e142df	[MachineOutliner] NFC: Pull variable out from erase_if `Mapper.UnsignedVec.begin()` never changes throughout the call to `erase_if`, so no need to recalculate it. Also drop some redundant braces.	2023-02-03 16:41:02 -08:00
Jessica Paquette	443c5b9fd5	[NFC] Remove redundant check for MBB being empty in outliner If the size is < 2, then we just break anyway.	2023-02-03 16:41:02 -08:00
Jessica Paquette	7bb9d70bbb	[NFC] Remove unneccessary `llvm::` in MachineOutliner/SuffixTree We have `using llvm`, we don't need to say `llvm::`.	2023-02-03 16:41:02 -08:00
Jessica Paquette	ec37ebf59b	[NFC] Use SmallVector/ArrayRef in MachineOutliner/SuffixTree for small types The MachineOutliner + SuffixTree both used `std::vector` everywhere because I didn't know any better at the time. At least for small types, such as `unsigned` and iterators, I can't see any particular reason to use std::vector over `SmallVector` here.	2023-02-03 16:41:02 -08:00
Jessica Paquette	4de8521bc5	[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges" Recommit with bug fixes + added testcases to the outliner. Also adds some debug output. We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner. The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates. Imagine a case like this: ``` bb: i1 i2 i3 ... i123456 ``` Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block. The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries. This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live. To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win. By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate. Doing all of this is about a 16% compile time improvement on the case. This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before.	2023-02-03 15:33:37 -08:00
Jessica Paquette	7ef8f9c972	[IR/MachineOutliner] Add a "nooutline" function attr and respect it Add `nooutline` + update LangRef to say it exists. This makes it possible to say "don't outline from this function ever." We want to be able to toggle whether or not a function should be in the search set regardless of default behaviour. Add testcases for the IR Outliner + Machine Outliner. Also remove an unnecessary check for an empty function in the Machine Outliner. Differential Revision: https://reviews.llvm.org/D140438	2022-12-22 10:22:08 -08:00
Kazu Hirata	998960ee1f	[CodeGen] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:08 -08:00
Eli Friedman	0ff51d5dde	Fix interaction of CFI instructions with MachineOutliner. 1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway through. 2. Make sure copied CFI directives refer to the correct instruction. Fixes https://github.com/llvm/llvm-project/issues/55842 Differential Revision: https://reviews.llvm.org/D126930	2022-06-10 13:37:49 -07:00
Kazu Hirata	3b9707dbc0	[llvm] Convert for_each to range-based for loops (NFC)	2022-06-05 12:07:14 -07:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
Jessica Paquette	68c718c8f4	Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"" This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781. This commit was not NFC. (See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)	2022-02-23 10:35:52 -08:00
Jessica Paquette	d97f997eb7	[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges" We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner. The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates. Imagine a case like this: ``` bb: i1 i2 i3 ... i123456 ``` Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block. The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries. This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live. To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win. By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate. Doing all of this is about a 16% compile time improvement on the case. This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before.	2022-02-21 15:29:16 -08:00
Jessica Paquette	12389e3758	[MachineOutliner] Add statistics for unsigned vector size Useful for debugging + evaluating improvements to the outliner. Stats are the number of illegal, legal, and invisible instructions in the unsigned vector, and it's total length.	2022-02-17 18:25:51 -08:00
Momchil Velikov	6398903ac8	Extend the `uwtable` attribute with unwind table kind We have the `clang -cc1` command-line option `-funwind-tables=1\|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or asynchronous unwind tables (2)`. However, this is encoded in LLVM IR by the presence or the absence of the `uwtable` attribute, i.e. we lose the information whether to generate want just some unwind tables or asynchronous unwind tables. Asynchronous unwind tables take more space in the runtime image, I'd estimate something like 80-90% more, as the difference is adding roughly the same number of CFI directives as for prologues, only a bit simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even more, if you consider tail duplication of epilogue blocks. Asynchronous unwind tables could also restrict code generation to having only a finite number of frame pointer adjustments (an example of not having a finite number of `SP` adjustments is on AArch64 when untagging the stack (MTE) in some cases the compiler can modify `SP` in a loop). Having the CFI precise up to an instruction generally also means one cannot bundle together CFI instructions once the prologue is done, they need to be interspersed with ordinary instructions, which means extra `DW_CFA_advance_loc` commands, further increasing the unwind tables size. That is to say, async unwind tables impose a non-negligible overhead, yet for the most common use cases (like C++ exceptions), they are not even needed. This patch extends the `uwtable` attribute with an optional value: - `uwtable` (default to `async`) - `uwtable(sync)`, synchronous unwind tables - `uwtable(async)`, asynchronous (instruction precise) unwind tables Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D114543	2022-02-14 14:35:02 +00:00
Ties Stuij	f5f28d5b0c	[ARM] Implement BTI placement pass for PACBTI-M This patch implements a new MachineFunction in the ARM backend for placing BTI instructions. It is similar to the existing AArch64 aarch64-branch-targets pass. BTI instructions are inserted into basic blocks that: - Have their address taken - Are the entry block of a function, if the function has external linkage or has its address taken - Are mentioned in jump tables - Are exception/cleanup landing pads Each BTI instructions is placed in the beginning of a BB after the so-called meta instructions (e.g. exception handler labels). Each outlining candidate and the outlined function need to be in agreement about whether BTI placement is enabled or not. If branch target enforcement is disabled for a function, the outliner should not covertly enable it by emitting a call to an outlined function, which begins with BTI. The cost mode of the outliner is adjusted to account for the extra BTI instructions in the outlined function. The ARM Constant Islands pass will maintain the count of the jump tables, which reference a block. A `BTI` instruction is removed from a block only if the reference count reaches zero. PAC instructions in entry blocks are replaced with PACBTI instructions (tests for this case will be added in a later patch because the compiler currently does not generate PAC instructions). The ARM Constant Island pass is adjusted to handle BTI instructions correctly. Functions with static linkage that don't have their address taken can still be called indirectly by linker-generated veneers and thus their entry points need be marked with BTI or PACBTI. The changes are tested using "LLVM IR -> assembly" tests, jump tables also have a MIR test. Unfortunately it is not possible add MIR tests for exception handling and computed gotos because of MIR parser limitations. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Mikhail Maltsev - Momchil Velikov - Ties Stuij Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D112426	2021-12-01 12:54:05 +00:00
DianQK	1e9fa0b12a	Fix the side effect of outlined function when the register is implicit use and implicit-def in the same instruction. This is the diff associated with {D95267}, and we need to mark $x0 as live whether or not $x0 is dead. The compiler also needs to mark register $x0 as live in for the following case. ``` $x1 = ADDXri $sp, 16, 0 BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def $x0 ``` This change fixes an issue where the wrong registers were used when -machine-outliner-reruns>0. As an example: ``` lang=c typedef struct { double v1; double v2; } D16; typedef struct { D16 v1; D16 v2; } D32; typedef long long LL8; typedef struct { long long v1; long long v2; } LL16; typedef struct { LL16 v1; LL16 v2; } LL32; typedef struct { LL32 v1; LL32 v2; } LL64; LL8 needx0(LL8 v0, LL8 v1); void bar(LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8); LL8 foo(LL8 v0, LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8) { LL8 result = needx0(v0, 0); bar(v1, v2, v3, v4, v5, v6, v7, v8); return result + 1; } ``` As you can see from the `foo` function, we should not modify the value of `x0` until we call `needx0`. This code is compiled to give the following instruction MIR code. ``` $sp = frame-setup SUBXri $sp, 256, 0 frame-setup STPDi killed $d13, killed $d12, $sp, 16 frame-setup STPDi killed $d11, killed $d10, $sp, 18 frame-setup STPDi killed $d9, killed $d8, $sp, 20 frame-setup STPXi killed $x26, killed $x25, $sp, 22 frame-setup STPXi killed $x24, killed $x23, $sp, 24 frame-setup STPXi killed $x22, killed $x21, $sp, 26 frame-setup STPXi killed $x20, killed $x19, $sp, 28 ... $x1 = MOVZXi 0, 0 BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0 ... ``` Since there are some other instruction sequences that duplicate `foo`, after the first execution of Machine Outliner you will get: ``` $sp = frame-setup SUBXri $sp, 256, 0 frame-setup STPDi killed $d13, killed $d12, $sp, 16 frame-setup STPDi killed $d11, killed $d10, $sp, 18 frame-setup STPDi killed $d9, killed $d8, $sp, 20 $x7 = ORRXrs $xzr, $lr, 0 BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26 $lr = ORRXrs $xzr, $x7, 0 ... BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ... ``` For the first time we outlined the following sequence: ``` frame-setup STPXi killed $x26, killed $x25, $sp, 22 frame-setup STPXi killed $x24, killed $x23, $sp, 24 frame-setup STPXi killed $x22, killed $x21, $sp, 26 frame-setup STPXi killed $x20, killed $x19, $sp, 28 ``` and ``` $x1 = MOVZXi 0, 0 BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0 ``` When we execute the outline again, we will get: ``` $x0 = ORRXrs $xzr, $lr, 0 <---- here BL @OUTLINED_FUNCTION_2_0, implicit-def $lr, implicit $sp, implicit-def $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $d8, implicit $d9, implicit $d10, implicit $d11, implicit $d12, implicit $d13, implicit $x0 $lr = ORRXrs $xzr, $x0, 0 $x7 = ORRXrs $xzr, $lr, 0 BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26 $lr = ORRXrs $xzr, $x7, 0 ... BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ``` When calling `OUTLINED_FUNCTION_2_0`, we used `x0` to save the `lr` register. The reason for the above error appears to be that: ``` BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ``` should be: ``` BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp, implicit $x0 ``` When processing the same instruction with both `implicit-def $x0` and `implicit $x0` we should keep `implicit $x0`. A reproducible demo is available at: [https://github.com/DianQK/reproduce_outlined_function_use_live_x0](https://github.com/DianQK/reproduce_outlined_function_use_live_x0). Reviewed By: jinlin Differential Revision: https://reviews.llvm.org/D112911	2021-11-17 09:44:10 -08:00
Jin Lin	7c2192b277	Add the use of register r for outlined function when register r is live in and defined later. The compiler needs to mark register $x0 as live in for the following case. $x1 = ADDXri $sp, 16, 0 BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def dead $x0 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D95267	2021-03-03 15:14:11 -08:00
Kazu Hirata	22f00f61dd	[CodeGen] Use range-based for loops (NFC)	2021-02-15 14:46:11 -08:00
Kazu Hirata	6a6e382161	[llvm] Drop unnecessary make_range (NFC)	2021-01-09 09:25:00 -08:00
Kazu Hirata	cfeecdf7b6	[llvm] Use llvm::all_of (NFC)	2021-01-06 18:27:36 -08:00
Kazu Hirata	1e3ed09165	[CodeGen] Use llvm::append_range (NFC)	2020-12-28 19:55:16 -08:00
Momchil Velikov	5b30d9adc0	[MachineOutliner] Do not outline debug instructions The debug location is removed from any outlined instruction. This causes the MachineVerifier to crash on outlined DBG_VALUE instructions. Then, debug instructions are "invisible" to the outliner, that is, two ranges of instructions from different functions are considered identical if the only difference is debug instructions. Since a debug instruction from one function is unlikely to provide sensible debug information about all functions, sharing an outlined sequence, this patch just removes debug instructions from the outlined functions. Differential Revision: https://reviews.llvm.org/D89485	2020-11-05 19:26:51 +00:00
Simon Pilgrim	3c83b967cf	LiveRegUnits.h - reduce MachineRegisterInfo.h include. NFC. We only need to include MachineInstrBundle.h, but exposes an implicit dependency in MachineOutliner.h. Also, remove duplicate includes from LiveRegUnits.cpp + MachineOutliner.cpp.	2020-09-08 17:27:00 +01:00
David Green	ca4c1ad854	[Outliner] Set nounwind for outlined functions This prevents the outlined functions from pulling in a lot of unnecessary code in our downstream libraries/linker. Which stops outlining making codesize worse in c++ code with no-exceptions. Differential Revision: https://reviews.llvm.org/D57254	2020-07-01 17:18:34 +01:00
Andrew Litteken	bb677cacc8	[SuffixTree][MachOpt] Factoring out Suffix Tree and adding Unit Tests This moves the SuffixTree test used in the Machine Outliner and moves it into Support for use in other outliners elsewhere in the compilation pipeline. Differential Revision: https://reviews.llvm.org/D80586	2020-06-08 12:44:18 -07:00
Puyan Lotfi	0c4aab27b3	[NFC] Outliner label name clean up. Just simplifying how the label name is generated while using std::to_string instead of Twine. Differential Revision: https://reviews.llvm.org/D79464	2020-05-05 23:27:46 -04:00

1 2 3 4

188 Commits