llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-24 19:06:04 +00:00

Author	SHA1	Message	Date
YongKang Zhu	19bad2ac4a	[BOLT][NFC] Fix an incorrect address used in a BOLT-INFO message (#127902 )	2025-02-19 16:57:18 -08:00
Amir Ayupov	61acfb07e8	[BOLT] Add pre-aggregated trace support (#127125 ) Traces are triplets of branch source, target, and fall-through end (next branch). Traces simplify differentiation of fall-throughs into local- and external-origin, which improves performance over profile with undifferentiated fall-throughs by eliminating profile discontinuity in call to continuation fall-throughs. This makes it possible to avoid converting return profile into call to continuation profile which may introduce statistical biases. The existing format makes provisions for local- (F) and external- (f) origin fall-throughs, but the profile producer needs to know function boundaries. BOLT has that information readily available, so providing the origin branch of a fall-through is a functional replacement of the fall-through kind (f or F). This also has an effect of combining branches and fall-throughs into a single record. As traces subsume other pre-aggregated profile kinds, BOLT may drop support for them soon. Users of pre-aggregated profile format are advised to migrate to the trace format. Test Plan: Updated callcont-fallthru.s	2025-02-13 15:14:56 -08:00
YongKang Zhu	1e0a489671	[BOLT] Resolve symlink for library lookup (#126386 )	2025-02-08 14:02:46 -08:00
Amir Ayupov	b884be8640	[BOLT] Exit with error code on missing DWO CU (#125976 ) If BOLT fails to locate DWO CU when using split DWARF, this signifies an issue with the input (missing .dwo) rather than an internal assertion.	2025-02-06 10:01:12 -08:00
Maksim Panchenko	3115278c4e	[BOLT] Fixup for commit 137c378/#125961	2025-02-06 00:26:20 -08:00
Maksim Panchenko	137c3781e6	[BOLT][AArch64] Include constant islands in disassembly (#125961 ) When printing disassembly of a function with constant islands, include the island info in the dump. At the moment, only print islands in pre-CFG state. Include islands that are interleaved with instructions.	2025-02-05 22:41:40 -08:00
Maksim Panchenko	ef232a7e34	[BOLT][AArch64] Remove nops in functions with defined control flow (#124705 ) When a function has an indirect branch with unknown control flow, we preserve nops in order to keep all instruction offsets (from the start of the function) the same in case the indirect branch is used by a PC-relative jump table. However, when we know the control flow of the function, we should be able to safely remove nops.	2025-01-28 11:03:49 -08:00
Maksim Panchenko	1b4bd4e1a5	[BOLT][AArch64] Remove assertions from jump table heuristic (#124372 ) The code for jump table detection on AArch64 asserts liberally whenever the input instruction sequence does not match the expected pattern. As a result, BOLT fails to process binaries with such sequences instead of ignoring functions with unknown control flow. Remove asserts in analyzeIndirectBranchFragment() and mark indirect jumps as instructions with unknown control flow instead.	2025-01-24 16:43:02 -08:00
Maksim Panchenko	34c6c5e72f	[BOLT][AArch64] Fix PLT optimization (#124192 ) Preserve C++ exception metadata while running PLT optimization on AArch64.	2025-01-24 14:20:24 -08:00
Amir Ayupov	e6c9cd9c06	[BOLT] Drop parsing sample PC when processing LBR perf data (#123420 ) Remove options to generate autofdo data (unused) and `use-event-pc` (not beneficial). Cuts down perf2bolt time for 11GB perf.data by 40s (11:10->10:30).	2025-01-21 09:04:49 -08:00
Alexey Moksyakov	ad599c25d9	[BOLT][AArch64] Add isPush & isPop (#120713 ) This functionality is needed for inliner pass and also for correct dyno stats. Needed for [PR](https://github.com/llvm/llvm-project/pull/120187)	2025-01-20 10:42:48 +08:00
Nicholas	ee4282259d	[BOLT][AArch64]support `inline-small-functions` for AArch64 (#120187 ) Add some functions in `AArch64MCPlusBuilder.cpp` to support inline for AArch64.	2025-01-17 17:55:55 +08:00
Nicholas	1fa02b9684	[BOLT][AArch64] Speedup `computeInstructionSize` (#121106 ) AArch64 instructions have a fixed size 4 bytes, no need to compute.	2025-01-17 09:48:17 +08:00
Nikita Popov	e7244d8659	[BOLT][CMake] Don't export bolt libraries in LLVMExports.cmake (#121936 ) Bolt makes use of add_llvm_library and as such ends up exporting its libraries from LLVMExports.cmake, which is not correct. Bolt doesn't have its own exports file, and I assume that there is no desire to have one either -- Bolt libraries are not intended to be consumed as a cmake module, right? As such, this PR adds a NO_EXPORT option to simplify exclude these libraries from the exports file.	2025-01-08 09:41:09 +01:00
Peter Waller	aa9cc721e5	Reapply "[BOLT] Add --pad-funcs-before=func:n (#117924 )" (#121918 ) - Reapply "[BOLT] Add --pad-funcs-before=func:n (#117924)" - [BOLT] Fix --pad-funcs{,-before} state misinteraction When --pad-funcs-before was introduced, it introduced a bug whereby the first one to get parsed could influence the other. Ensure that each has its own state and test that they don't interact in this manner by testing how the `_subsequent` symbol moves when both arguments are supplied with different padding values. Fixed by having a function (and static state) for each of before/after.	2025-01-07 17:25:04 +00:00
Amir Ayupov	be21bd9bbf	Revert "[BOLT] Add --pad-funcs-before=func:n (#117924 )" 14dcf8214f9c66172d17c1cfaec6aec0030748e0 introduced a subtle bug with the static `FunctionPadding` map. If either `opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` are set, the map is going to be populated with the respective spec in the first invocation of `BinaryEmitter::emitFunction`. The subsequent invocations will pick up the padding from the map irrespective of whether `opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` is passed as a parameter. This breaks an internal test, hence reverting the patch.	2025-01-06 12:57:43 -08:00
Franklin	6e8a1a45a7	[BOLT] Detect Linux kernel version if the binary is a Linux kernel (#119088 ) This makes it easier to handle differences (e.g. of exception table entry size) between versions of Linux kernel	2024-12-26 09:54:23 -08:00
Alexey Moksyakov	e11d49cbf5	[BOLT][AArch64] Adds tls relocations support (#117465 ) Co-authored-by: yavtuk <yavtuk@ya.ru>	2024-12-20 15:54:36 +03:00
Kristof Beyls	4111841f88	[BOLT] Correctly print preferred disassembly for annotated instructions (#120564 ) This patch makes sure that `BinaryContext::printInstruction` prints the preferred disassembly. Preferred disassembly only gets printed when there are no annotations on the MCInst. Therefore, this patch temporarily removes the annotations before printing it. A few examples of before and after on AArch64 instructions are as follows: ``` BEFORE AFTER (preferred disassembly) ret x30 ret orr x30, xzr, x0 mov x30, x0 hint #29 autiasp hint #12 autia1716 ``` Clearly, the preferred disassembly is easier for developers to read, and is the disassembly that tools should be printing. This patch is motivated as part of future work on the llvm-bolt-binary-analysis tool, making sure that the reports it prints do use preferred disassembly. This patch was cherry-picked from https://github.com/kbeyls/llvm-project/tree/bolt-gadget-scanner-prototype. In this current patch, this only affects existing RISCV test cases. This patch also does improve test cases in future patches that will introduce a binary analysis for llvm-bolt-binary-analysis that checks for correct application of pac-ret (pointer authentication on return addresses).	2024-12-20 08:54:07 +00:00
Maksim Panchenko	21684e38ee	[BOLT][Linux] Refactor reading of PC-relative addresses. NFCI (#120491 ) Fix evaluation order problem identified in https://github.com/llvm/llvm-project/pull/119088.	2024-12-19 10:40:25 -08:00
Alexander Yermolovich	3c357a49d6	[BOLT] Add support for safe-icf (#116275 ) Identical Code Folding (ICF) folds functions that are identical into one function, and updates symbol addresses to the new address. This reduces the size of a binary, but can lead to problems. For example when function pointers are compared. This can be done either explicitly in the code or generated IR by optimization passes like Indirect Call Promotion (ICP). After ICF what used to be two different addresses become the same address. This can lead to a different code path being taken. This is where safe ICF comes in. Linker (LLD) does it using address significant section generated by clang. If symbol is in it, or an object doesn't have this section symbols are not folded. BOLT does not have the information regarding which objects do not have this section, so can't re-use this mechanism. This implementation scans code section and conservatively marks functions symbols as unsafe. It treats symbols as unsafe if they are used in non-control flow instruction. It also scans through the data relocation sections and does the same for relocations that reference a function symbol. The latter handles the case when function pointer is stored in a local or global variable, etc. If a relocation address points within a vtable these symbols are skipped.	2024-12-16 21:49:53 -08:00
Nicholas	671095b452	[BOLT][AArch64] Check Last Element Instead of Returning `nullptr` in `lookupStubFromGroup` (#114015 ) The current implementation of `lookupStubFromGroup` is incorrect. The function is intended to find and return the closest stub using `lower_bound`, which identifies the first element in a sorted range that is not less than a specified value. However, if such an element is not found within `Candidates` and the list is not empty, the function returns `nullptr`. Instead, it should check whether the last element satisfies the condition.	2024-12-16 12:14:11 +00:00
Paschalis Mpeis	2df48fa78b	[BOLT][AArch64] Enable function print after ADRRelaxation (#119869 ) Introduce `--print-adr-relaxation` to print after ADR Relaxation pass.	2024-12-16 12:06:56 +00:00
Alexander Yermolovich	331c2dd8b4	[BOLT][DWARF] Add support for DW_OP_GNU_push_tls_address to .debug_names (#119939 ) Added support to BOLT for DW_OP_GNU_push_tls_address. So now DW_TAG_variable with this OP in DW_AT_location will appear in debug names acceleration table. Although not in the DWARF 5 spec it is similar to DW_OP_form_tls_address. Without this support llvm-dwarfdump --verify --debug-names will report errors.	2024-12-14 09:30:25 -08:00
Maksim Panchenko	b560b87ba1	[BOLT] Clean up jump table handling in non-reloc mode. NFCI (#119614 ) This change affects non-relocation mode only. Prior to having CheckLargeFunctions pass, we could have emitted code for functions that was discarded at the end due to size limitations. Since we didn't know at the time of emission if the code would be discarded or not, we had to emit jump tables in separate sections and handle them separately. However, now we always run CheckLargeFunctions and make sure all emitted code is used. Thus, we can get rid of the special jump table handling.	2024-12-13 13:14:02 -08:00
Kristof Beyls	ceb7214be0	[BOLT] Introduce binary analysis tool based on BOLT (#115330 ) This initial commit does not add any specific binary analyses yet, it merely contains the boilerplate to introduce a new BOLT-based tool. This basically combines the 4 first patches from the prototype pac-ret and stack-clash binary analyzer discussed in RFC https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 and published at https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype The introduction of such a BOLT-based binary analysis tool was proposed and discussed in at least the following places: - The RFC pointed to above - EuroLLVM 2024 round table https://discourse.llvm.org/t/summary-of-bolt-as-a-binary-analysis-tool-round-table-at-eurollvm/78441 The round table showed quite a few people interested in being able to build a custom binary analysis quickly with a tool like this. - Also at the US LLVM dev meeting a few weeks ago, I heard interest from a few people, asking when the tool would be available upstream. - The presentation "Adding Pointer Authentication ABI support for your ELF platform" (https://llvm.swoogo.com/2024devmtg/session/2512720/adding-pointer-authentication-abi-support-for-your-elf-platform) explicitly mentioned interest to extend the prototype tool to verify correct implementation of pauthabi.	2024-12-12 10:06:27 +00:00
Alexander Yermolovich	4b825c7417	[BOLT][DWARF] Add support for transitive DW_AT_name/DW_AT_linkage_name resolution for DW_AT_name/DW_AT_linkage_name. (#119493 ) This fix handles a case where a DIE that does not have DW_AT_name/DW_AT_linkage_name, but has a reference to another DIE using DW_AT_abstract_origin/DW_AT_specification. It also fixes a bug where there are cross CU references for those attributes. Previously it would use a DWARF Unit of a DIE which was being processed The warf5-debug-names-cross-cu.s test just happened to work because how it was constructed where string section was shared by both DWARF Units. To resolve DW_AT_name/DW_AT_linkage_name this patch iterates over references until it either reaches the final DIE or finds both of those names.	2024-12-11 14:27:56 -08:00
Peter Waller	14dcf8214f	[BOLT] Add --pad-funcs-before=func:n (#117924 ) This complements --pad-funcs, and by using both simultaneously, enables moving a specific function through the address space without modifying any code other than the targeted function (and references to it) by doing (before+after=constant). See also: proposed functionality to enable inserting random padding in https://discourse.llvm.org/t/rfc-lld-feature-for-controlling-for-code-size-dependent-measurement-bias and https://github.com/llvm/llvm-project/pull/117653	2024-12-11 09:58:52 +00:00
Alexander Yermolovich	50c0e679b9	[BOLT][DWARF] Add support for DW_TAG_union_type to DebugNames. (#119023 ) Adding support for DW_TAG_union_type for DebugNames acceleration tables.	2024-12-06 15:45:52 -08:00
Jared Wyles	2ccf7ed277	[JITLink] Switch to SymbolStringPtr for Symbol names (#115796 ) Use SymbolStringPtr for Symbol names in LinkGraph. This reduces string interning on the boundary between JITLink and ORC, and allows pointer comparisons (rather than string comparisons) between Symbol names. This should improve the performance and readability of code that bridges between JITLink and ORC (e.g. ObjectLinkingLayer and ObjectLinkingLayer::Plugins). To enable use of SymbolStringPtr a std::shared_ptr<SymbolStringPool> is added to LinkGraph and threaded through to its construction sites in LLVM and Bolt. All LinkGraphs that are to have symbol names compared by pointer equality must point to the same SymbolStringPool instance, which in ORC sessions should be the pool attached to the ExecutionSession. --------- Co-authored-by: Lang Hames <lhames@gmail.com>	2024-12-06 10:22:09 +11:00
Maksim Panchenko	d5956fb8f9	[BOLT][AArch64] Add support for short LLD thunks/veneers (#118422 ) When a callee function is closer than 256MB from its call site, LLD linker can strategically create a short thunk for the function with a single branch instruction (that covers +/-128MB). Detect and convert such thunks into direct calls in BOLT.	2024-12-03 13:44:51 -08:00
Paschalis Mpeis	51003076eb	Reapply [BOLT] DataAggregator support for binaries with multiple text segments (#118023 ) When a binary has multiple text segments, the Size is computed as the difference of the last address of these segments from the BaseAddress. The base addresses of all text segments must be the same. Introduces flag 'perf-script-events' for testing, which allows passing perf events without BOLT having to parse them by invoking 'perf script'. The flag is used to pass a mock perf profile that has two memory mappings for a mock binary that has two text segments. The mapping size is updated as `parseMMapEvents` now processes all text segments.	2024-12-02 09:20:40 +00:00
Peter Waller	b5ed375f9d	[BOLT] Skip _init; avoiding GOT breakage for static binaries (#117751 ) _init is used during startup of binaires. Unfortunately, its address can be shared (at least on AArch64 glibc static binaries) with a data reference that lives in the GOT. The GOT rewriting is currently unable to distinguish between data addresses and function addresses. This leads to the data address being incorrectly rewritten, causing a crash on startup of the binary: Unexpected reloc type in static binary. To avoid this, don't consider _init for being moved, by skipping it. ~We could add further conditions to narrow the skipped case for known crashes, but as a straw man I thought it'd be best to keep the condition as simple as possible and see if there any objections to this.~ (Edit: this broke the test bolt/test/runtime/X86/retpoline-synthetic.test, because _init was skipped from the retpoline pass and it has an indirect call in it, so I include a check for static binaries now, which avoids the test failure, but perhaps this could/should be narrowed further?) For now, skip _init for static binaries on any architecture; we could add further conditions to narrow the skipped case for known crashes, but as a straw man I thought it'd be best to keep the condition as simple as possible and see if there any objections to this. Updates #100096.	2024-11-28 14:59:07 +00:00
Enna1	4d2bc0adc6	[BOLT] Extract comparator for sorting functions by index into helper function (#116217 ) This change extracts the comparator for sorting functions by index into a helper function `compareBinaryFunctionByIndex()` Not sure why the comparator used in `BinaryContext::getSortedFunctions()` is not same as the other two places. I think they should use the same comparator, so I also change `BinaryContext::getSortedFunctions()` to use `compareBinaryFunctionByIndex()` for sorting functions.	2024-11-27 09:01:12 +08:00
Hans Wennborg	537343dea4	Revert "[BOLT] DataAggregator support for binaries with multiple text segments (#92815 )" This caused test failures, see comment on the PR: Failed Tests (2): BOLT-Unit :: Core/./CoreTests/AArch64/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0 BOLT-Unit :: Core/./CoreTests/X86/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0 > When a binary has multiple text segments, the Size is computed as the > difference of the last address of these segments from the BaseAddress. > The base addresses of all text segments must be the same. > > Introduces flag 'perf-script-events' for testing. It allows passing perf events > without BOLT having to parse them using 'perf script'. The flag is used to > pass a mock perf profile that has two memory mappings for a mock binary > that has two text segments. The size of the mapping is updated as this > change `parseMMapEvents` processes all text segments. This reverts commit 4b71b3782d217db0138b701c4514bd2168ca1659.	2024-11-26 14:59:30 +01:00
Paschalis Mpeis	4b71b3782d	[BOLT] DataAggregator support for binaries with multiple text segments (#92815 ) When a binary has multiple text segments, the Size is computed as the difference of the last address of these segments from the BaseAddress. The base addresses of all text segments must be the same. Introduces flag 'perf-script-events' for testing. It allows passing perf events without BOLT having to parse them using 'perf script'. The flag is used to pass a mock perf profile that has two memory mappings for a mock binary that has two text segments. The size of the mapping is updated as this change `parseMMapEvents` processes all text segments.	2024-11-25 13:12:43 +00:00
Maksim Panchenko	92301180f7	[BOLT] Use compact EH format for fixed-address executables (#117274 ) Use ULEB128 format for emitting LSDAs for fixed-address executables, similar to what we use for PIEs/DSOs. Main difference is that we don't use landing pad trampolines when landing pads are not contained in a single fragment. Instead, we fallback to emitting larger fixed-address LSDAs, which is still better than adding trampoline instructions.	2024-11-22 00:28:55 -08:00
Maksim Panchenko	105ecd8bb2	[BOLT] Avoid EH trampolines for PIEs/DSOs (#117106 ) We used to emit EH trampolines for PIE/DSO whenever a function fragment contained a landing pad outside of it. However, it is common to have all landing pads in a cold fragment even when their throwers are in a hot one. To reduce the number of trampolines, analyze landing pads for any given function fragment, and if they all belong to the same (possibly different) fragment, designate that fragment as a landing pad fragment for the "thrower" fragment. Later, emit landing pad fragment symbol as an LPStart for the thrower LSDA.	2024-11-21 18:18:30 -08:00
Maksim Panchenko	3282be1f8d	[BOLT] Use ULEB128 encoding for PIE/DSO exception tables (#116911 ) Use ULEB128 encoding for call sites in PIE/DSO binaries. The encoding reduces the size of the tables compared to sdata4 and is the default format used by Clang. Note that for fixed-address executables we still use absolute addressing to cover cases where landing pads can reside in different function fragments. For testing, we rely on runtime EH tests.	2024-11-20 12:29:23 -08:00
Maksim Panchenko	066dd91ad8	[BOLT] Offset LPStart to avoid unnecessary instructions (#116713 ) For C++ exception handling, when we write a call site table, we must avoid emitting 0-value offsets for landing pads unless the call site has no landing pad. However, 0 can be a real offset from the start of the FDE if the FDE corresponds to a function fragment that starts with a landing pad. In such cases, we used to emit a trap instruction at the start of the fragment to guarantee non-zero LP offset. To avoid emitting unnecessary trap instructions, we can instead set LPStart to an offset from the FDE. If we emit it as [FDEStart - 1], then all real offsets from LPStart in FDE become non-negative.	2024-11-19 16:45:03 -08:00
Maksim Panchenko	996553228f	[BOLT] Overwrite .eh_frame and .gcc_except_table (#116755 ) Under --use-old-text or --strict, we completely rewrite contents of EH frames and exception tables sections. If new contents of either section do not exceed the size of the original section, rewrite the section in-place.	2024-11-19 12:59:05 -08:00
Maksim Panchenko	08ef939637	[BOLT] Overwrite .eh_frame_hdr in-place (#116730 ) If the new EH frame header can fit into the original .eh_frame_hdr section, overwrite it in-place and pad with zeroes.	2024-11-18 20:42:38 -08:00
Maksim Panchenko	93a4244523	[BOLT] Use new assembler directives for EH table emission (#116294 ) When emitting C++ exception tables (LSDAs), BOLT used to estimate the size of the tables beforehand. This implementation was necessary as the assembler/streamer lacked the emitULEB128IntValue() functionality. As I plan to introduce [u\|s]uleb128-encoded exception tables in BOLT, now is a perfect time to switch to the new API and eliminate the need to pre-compute the size of the tables.	2024-11-17 12:40:07 -08:00
Maksim Panchenko	1b8e0cf090	[BOLT] Never emit "large" functions (#115974 ) "Large" functions are functions that are too big to fit into their original slots after code modifications. CheckLargeFunctions pass is designed to prevent such functions from emission. Extend this pass to work with functions with constant islands. Now that CheckLargeFunctions covers all functions, it guarantees that we will never see such functions after code emission on all platforms (previously it was guaranteed on x86 only). Hence, we can get rid of RewriteInstance extensions that were meant to support "large" functions.	2024-11-13 09:58:44 -08:00
Maksim Panchenko	d922045381	[BOLT] Use AsmInfo for address size. NFCI (#115932 ) Use AsmInfo instead of DWARFObj interface for extracting address size and format.	2024-11-12 11:53:34 -08:00
Maksim Panchenko	be89e794f7	[BOLT][AArch64] Add support for long absolute LLD thunks/veneers (#113408 ) Absolute thunks generated by LLD reference function addresses recorded as data in code. Since they are generated by the linker, they don't have relocations associated with them and thus the addresses are left undetected. Use pattern matching to detect such thunks and handle them in VeneerElimination pass.	2024-11-12 11:27:08 -08:00
Kazu Hirata	06e0869624	[BOLT] Fix warnings This patch fixes: bolt/lib/Profile/StaleProfileMatching.cpp:694:24: error: unused variable 'BinHash' [-Werror,-Wunused-variable] bolt/lib/Profile/YAMLProfileWriter.cpp:206:61: error: missing field 'GUID' initializer [-Werror,-Wmissing-field-initializers] bolt/lib/Profile/YAMLProfileReader.cpp:840:16: error: unused variable 'MatchedWithPseudoProbes' [-Werror,-Wunused-variable]	2024-11-12 09:39:57 -08:00
Shaw Young	9a9af0a23f	[BOLT] Match blocks with pseudo probes (#99891 ) Match inline trees first between profile and the binary: by GUID, checksum, parent, and inline site for inlined functions. Map profile probes to binary probes via matched inline tree nodes. Each binary probe has an associated binary basic block. If all probes from one profile basic block map to the same binary basic block, it’s an exact match, otherwise the block is determined by majority vote and reported as loose match. Pseudo probe matching happens between exact hash matching and call/loose matching. Introduce ProbeMatchSpec - a mechanism to match probes belonging to another binary function. For example, given functions foo and bar: ``` void foo() { bar(); } ``` profiled binary: bar is not inlined => have top-level function bar new binary where the profile is applied to: bar is inlined into foo. Currently, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (e.g. if bar was inlined into multiple functions). In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching). Test Plan: Added match-blocks-with-pseudo-probes.test Performance test: - Setup: - Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO (#79942) - BOLT fresh: BOLTed Clang using fresh profile, - BOLT stale (hash): BOLTed Clang using stale profile (collected on Clang 10K commits back), `-infer-stale-profile` (hash+call block matching) - BOLT stale (+probe): BOLTed Clang using stale profile, `-infer-stale-profile` with `-stale-matching-with-pseudo-probes` (hash+call+pseudo probe block matching) - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for build, - BOLT profiles are collected on Clang compiling large preprocessed C++ file. - Benchmark: building Clang (average of 5 runs), see driver in aaupov/llvm-devmtg-2022 - Results, wall time, lower is better: - Baseline no-BOLT: 429.52 +- 2.61s, - BOLT stale (hash): 413.21 +- 2.19s, - BOLT stale (+probe): 409.69 +- 1.41s, - BOLT fresh: 384.50 +- 1.80s. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>	2024-11-12 07:21:03 -08:00
Daniel Sanders	74003f11b3	[mc] Add CFI directive to emit val_offset() rules (#113971 ) These specify that the value of the given register in the previous frame is the CFA plus some offset. This isn't very common but can be necessary if the original value is normally reconstructed from the stack/frame pointer instead of being saved on the stack and reloaded from there.	2024-11-11 11:38:36 -08:00
Amir Ayupov	7ec682b16b	[MC] Use StringRefs from pseudo_probe_desc section if it's mapped Add `IsMMapped` flag to `buildGUID2FuncDescMap` controlling whether to allocate a string in `FuncNameAllocator` or use StringRef directly. Keep it false by default, only set it for BOLT use case because BOLT keeps file sections in memory while processing them. llvm-profgen constructs GUID2FuncDescMap and then releases the binary. For medium sized binary with 0.8 GiB .pseudo_probe_desc section, this saves 0.7 GiB peak RSS in perf2bolt. Test Plan: no-op for llvm-profgen, NFC for perf2bolt Reviewers: maksfb, dcci, wlei-llvm, rafaelauler, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/112996	2024-11-08 16:39:33 -08:00

1 2 3 4 5 ...

1066 Commits