llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-18 05:26:36 +00:00

Author	SHA1	Message	Date
Maksim Panchenko	d5956fb8f9	[BOLT][AArch64] Add support for short LLD thunks/veneers (#118422 ) When a callee function is closer than 256MB from its call site, LLD linker can strategically create a short thunk for the function with a single branch instruction (that covers +/-128MB). Detect and convert such thunks into direct calls in BOLT.	2024-12-03 13:44:51 -08:00
Paschalis Mpeis	51003076eb	Reapply [BOLT] DataAggregator support for binaries with multiple text segments (#118023 ) When a binary has multiple text segments, the Size is computed as the difference of the last address of these segments from the BaseAddress. The base addresses of all text segments must be the same. Introduces flag 'perf-script-events' for testing, which allows passing perf events without BOLT having to parse them by invoking 'perf script'. The flag is used to pass a mock perf profile that has two memory mappings for a mock binary that has two text segments. The mapping size is updated as `parseMMapEvents` now processes all text segments.	2024-12-02 09:20:40 +00:00
David Spickett	085e7d2b22	[bolt] Move CODE_OWNERS.txt to Maintainers.txt (#118082 ) To align with: https://llvm.org/docs/DeveloperPolicy.html#maintainers I have not changed the format of the file, my only goal here is that the project have a `bolt/Maintainers.*` so it is easy to find.	2024-12-02 09:12:57 +00:00
Peter Waller	b5ed375f9d	[BOLT] Skip _init; avoiding GOT breakage for static binaries (#117751 ) _init is used during startup of binaires. Unfortunately, its address can be shared (at least on AArch64 glibc static binaries) with a data reference that lives in the GOT. The GOT rewriting is currently unable to distinguish between data addresses and function addresses. This leads to the data address being incorrectly rewritten, causing a crash on startup of the binary: Unexpected reloc type in static binary. To avoid this, don't consider _init for being moved, by skipping it. ~We could add further conditions to narrow the skipped case for known crashes, but as a straw man I thought it'd be best to keep the condition as simple as possible and see if there any objections to this.~ (Edit: this broke the test bolt/test/runtime/X86/retpoline-synthetic.test, because _init was skipped from the retpoline pass and it has an indirect call in it, so I include a check for static binaries now, which avoids the test failure, but perhaps this could/should be narrowed further?) For now, skip _init for static binaries on any architecture; we could add further conditions to narrow the skipped case for known crashes, but as a straw man I thought it'd be best to keep the condition as simple as possible and see if there any objections to this. Updates #100096.	2024-11-28 14:59:07 +00:00
Sander de Smalen	318c69de52	Reland "[AArch64] Define high bits of FPR and GPR registers (take 2) (#114827 )" The issue with slow compile-time was caused by an assert in AArch64RegisterInfo.cpp. The assert invokes 'checkAllSuperRegsMarked' after adding all the reserved registers. This call gets very expensive after adding the _HI registers due to the way the function searches in the 'Exception' list, which is expected to be a small list but isn't (the patch added 190 _HI regs). It was possible to rewrite the code in such a way that the _HI registers are marked as reserved after the check. This makes the problem go away entirely and restores compile-time to what it was before (tested for `check-runtimes`, which previously showed a ~5x slowdown). This reverts commits: 1434d2ab215e3ea9c5f34689d056edd3d4423a78 2704647fb7986673b89cef1def729e3b022e2607	2024-11-27 13:31:59 +00:00
Enna1	4d2bc0adc6	[BOLT] Extract comparator for sorting functions by index into helper function (#116217 ) This change extracts the comparator for sorting functions by index into a helper function `compareBinaryFunctionByIndex()` Not sure why the comparator used in `BinaryContext::getSortedFunctions()` is not same as the other two places. I think they should use the same comparator, so I also change `BinaryContext::getSortedFunctions()` to use `compareBinaryFunctionByIndex()` for sorting functions.	2024-11-27 09:01:12 +08:00
Raul Tambre	003b48e0cb	[BOLT][test] enable GNU extensions, use C++ compiler, remove unnecessary target (#117043 ) 1. With a Clang that doesn't default to GNU extensions they need to be enabled explicitly. 2. The X86 directory lit config sets it already, there's no reason for this test to do it by itself. 3. The C frontend executable will fail if there's for example a Clang resource file for the C++ mode that sets C++-specific options: ``` + /home/tambre/dev/llvm/build/bin/clang --target=x86_64-unknown-linux-gnu -fPIE -fuse-ld=lld -Wl,--unresolved-symbols=ignore-all -pie -fPIC -shared /home/tambre/dev/llvm/bolt/test/R_ABS.pic.lld.cpp -o /home/tambre/dev/llvm/build/tools/bolt/test/Output/R_ABS.pic.lld.cpp.tmp.so -Wl,-q -fuse-ld=lld clang: warning: argument unused during compilation: '-pie' [-Wunused-command-line-argument] error: invalid argument '-std=c23' not allowed with 'C++' ```	2024-11-27 00:14:00 +02:00
Hans Wennborg	537343dea4	Revert "[BOLT] DataAggregator support for binaries with multiple text segments (#92815 )" This caused test failures, see comment on the PR: Failed Tests (2): BOLT-Unit :: Core/./CoreTests/AArch64/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0 BOLT-Unit :: Core/./CoreTests/X86/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0 > When a binary has multiple text segments, the Size is computed as the > difference of the last address of these segments from the BaseAddress. > The base addresses of all text segments must be the same. > > Introduces flag 'perf-script-events' for testing. It allows passing perf events > without BOLT having to parse them using 'perf script'. The flag is used to > pass a mock perf profile that has two memory mappings for a mock binary > that has two text segments. The size of the mapping is updated as this > change `parseMMapEvents` processes all text segments. This reverts commit 4b71b3782d217db0138b701c4514bd2168ca1659.	2024-11-26 14:59:30 +01:00
Paschalis Mpeis	957c2ac4f1	[BOLT] Fix for bughunter.sh in offline mode (#116649 ) In offline mode, the script sets 'PASS' variable and does not use it. Surrounding code suggests using 'FAIL' variable instead.	2024-11-25 13:13:10 +00:00
Paschalis Mpeis	4b71b3782d	[BOLT] DataAggregator support for binaries with multiple text segments (#92815 ) When a binary has multiple text segments, the Size is computed as the difference of the last address of these segments from the BaseAddress. The base addresses of all text segments must be the same. Introduces flag 'perf-script-events' for testing. It allows passing perf events without BOLT having to parse them using 'perf script'. The flag is used to pass a mock perf profile that has two memory mappings for a mock binary that has two text segments. The size of the mapping is updated as this change `parseMMapEvents` processes all text segments.	2024-11-25 13:12:43 +00:00
Maksim Panchenko	2704647fb7	Revert "Fix up MCPlusBuilder.cpp to account for W0_HI on AArch64" This reverts commit 576865a50e6ccb74196c9491fa79575d6d7f0b0b. Depends on #114827 that was reverted.	2024-11-22 13:57:30 -08:00
Maksim Panchenko	92301180f7	[BOLT] Use compact EH format for fixed-address executables (#117274 ) Use ULEB128 format for emitting LSDAs for fixed-address executables, similar to what we use for PIEs/DSOs. Main difference is that we don't use landing pad trampolines when landing pads are not contained in a single fragment. Instead, we fallback to emitting larger fixed-address LSDAs, which is still better than adding trampoline instructions.	2024-11-22 00:28:55 -08:00
Maksim Panchenko	105ecd8bb2	[BOLT] Avoid EH trampolines for PIEs/DSOs (#117106 ) We used to emit EH trampolines for PIE/DSO whenever a function fragment contained a landing pad outside of it. However, it is common to have all landing pads in a cold fragment even when their throwers are in a hot one. To reduce the number of trampolines, analyze landing pads for any given function fragment, and if they all belong to the same (possibly different) fragment, designate that fragment as a landing pad fragment for the "thrower" fragment. Later, emit landing pad fragment symbol as an LPStart for the thrower LSDA.	2024-11-21 18:18:30 -08:00
Maksim Panchenko	3282be1f8d	[BOLT] Use ULEB128 encoding for PIE/DSO exception tables (#116911 ) Use ULEB128 encoding for call sites in PIE/DSO binaries. The encoding reduces the size of the tables compared to sdata4 and is the default format used by Clang. Note that for fixed-address executables we still use absolute addressing to cover cases where landing pads can reside in different function fragments. For testing, we rely on runtime EH tests.	2024-11-20 12:29:23 -08:00
Maksim Panchenko	066dd91ad8	[BOLT] Offset LPStart to avoid unnecessary instructions (#116713 ) For C++ exception handling, when we write a call site table, we must avoid emitting 0-value offsets for landing pads unless the call site has no landing pad. However, 0 can be a real offset from the start of the FDE if the FDE corresponds to a function fragment that starts with a landing pad. In such cases, we used to emit a trap instruction at the start of the fragment to guarantee non-zero LP offset. To avoid emitting unnecessary trap instructions, we can instead set LPStart to an offset from the FDE. If we emit it as [FDEStart - 1], then all real offsets from LPStart in FDE become non-negative.	2024-11-19 16:45:03 -08:00
Maksim Panchenko	996553228f	[BOLT] Overwrite .eh_frame and .gcc_except_table (#116755 ) Under --use-old-text or --strict, we completely rewrite contents of EH frames and exception tables sections. If new contents of either section do not exceed the size of the original section, rewrite the section in-place.	2024-11-19 12:59:05 -08:00
Maksim Panchenko	08ef939637	[BOLT] Overwrite .eh_frame_hdr in-place (#116730 ) If the new EH frame header can fit into the original .eh_frame_hdr section, overwrite it in-place and pad with zeroes.	2024-11-18 20:42:38 -08:00
Maksim Panchenko	93a4244523	[BOLT] Use new assembler directives for EH table emission (#116294 ) When emitting C++ exception tables (LSDAs), BOLT used to estimate the size of the tables beforehand. This implementation was necessary as the assembler/streamer lacked the emitULEB128IntValue() functionality. As I plan to introduce [u\|s]uleb128-encoded exception tables in BOLT, now is a perfect time to switch to the new API and eliminate the need to pre-compute the size of the tables.	2024-11-17 12:40:07 -08:00
Sander de Smalen	576865a50e	Fix up MCPlusBuilder.cpp to account for W0_HI on AArch64 Landing #114827 broke these tests, because they did not account for the new artificial registers.	2024-11-14 12:02:14 +00:00
Maksim Panchenko	1b8e0cf090	[BOLT] Never emit "large" functions (#115974 ) "Large" functions are functions that are too big to fit into their original slots after code modifications. CheckLargeFunctions pass is designed to prevent such functions from emission. Extend this pass to work with functions with constant islands. Now that CheckLargeFunctions covers all functions, it guarantees that we will never see such functions after code emission on all platforms (previously it was guaranteed on x86 only). Hence, we can get rid of RewriteInstance extensions that were meant to support "large" functions.	2024-11-13 09:58:44 -08:00
Maksim Panchenko	d922045381	[BOLT] Use AsmInfo for address size. NFCI (#115932 ) Use AsmInfo instead of DWARFObj interface for extracting address size and format.	2024-11-12 11:53:34 -08:00
Maksim Panchenko	be89e794f7	[BOLT][AArch64] Add support for long absolute LLD thunks/veneers (#113408 ) Absolute thunks generated by LLD reference function addresses recorded as data in code. Since they are generated by the linker, they don't have relocations associated with them and thus the addresses are left undetected. Use pattern matching to detect such thunks and handle them in VeneerElimination pass.	2024-11-12 11:27:08 -08:00
Kazu Hirata	06e0869624	[BOLT] Fix warnings This patch fixes: bolt/lib/Profile/StaleProfileMatching.cpp:694:24: error: unused variable 'BinHash' [-Werror,-Wunused-variable] bolt/lib/Profile/YAMLProfileWriter.cpp:206:61: error: missing field 'GUID' initializer [-Werror,-Wmissing-field-initializers] bolt/lib/Profile/YAMLProfileReader.cpp:840:16: error: unused variable 'MatchedWithPseudoProbes' [-Werror,-Wunused-variable]	2024-11-12 09:39:57 -08:00
Shaw Young	9a9af0a23f	[BOLT] Match blocks with pseudo probes (#99891 ) Match inline trees first between profile and the binary: by GUID, checksum, parent, and inline site for inlined functions. Map profile probes to binary probes via matched inline tree nodes. Each binary probe has an associated binary basic block. If all probes from one profile basic block map to the same binary basic block, it’s an exact match, otherwise the block is determined by majority vote and reported as loose match. Pseudo probe matching happens between exact hash matching and call/loose matching. Introduce ProbeMatchSpec - a mechanism to match probes belonging to another binary function. For example, given functions foo and bar: ``` void foo() { bar(); } ``` profiled binary: bar is not inlined => have top-level function bar new binary where the profile is applied to: bar is inlined into foo. Currently, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (e.g. if bar was inlined into multiple functions). In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching). Test Plan: Added match-blocks-with-pseudo-probes.test Performance test: - Setup: - Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO (#79942) - BOLT fresh: BOLTed Clang using fresh profile, - BOLT stale (hash): BOLTed Clang using stale profile (collected on Clang 10K commits back), `-infer-stale-profile` (hash+call block matching) - BOLT stale (+probe): BOLTed Clang using stale profile, `-infer-stale-profile` with `-stale-matching-with-pseudo-probes` (hash+call+pseudo probe block matching) - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for build, - BOLT profiles are collected on Clang compiling large preprocessed C++ file. - Benchmark: building Clang (average of 5 runs), see driver in aaupov/llvm-devmtg-2022 - Results, wall time, lower is better: - Baseline no-BOLT: 429.52 +- 2.61s, - BOLT stale (hash): 413.21 +- 2.19s, - BOLT stale (+probe): 409.69 +- 1.41s, - BOLT fresh: 384.50 +- 1.80s. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>	2024-11-12 07:21:03 -08:00
Daniel Sanders	74003f11b3	[mc] Add CFI directive to emit val_offset() rules (#113971 ) These specify that the value of the given register in the previous frame is the CFA plus some offset. This isn't very common but can be necessary if the original value is normally reconstructed from the stack/frame pointer instead of being saved on the stack and reloaded from there.	2024-11-11 11:38:36 -08:00
Amir Ayupov	7ec682b16b	[MC] Use StringRefs from pseudo_probe_desc section if it's mapped Add `IsMMapped` flag to `buildGUID2FuncDescMap` controlling whether to allocate a string in `FuncNameAllocator` or use StringRef directly. Keep it false by default, only set it for BOLT use case because BOLT keeps file sections in memory while processing them. llvm-profgen constructs GUID2FuncDescMap and then releases the binary. For medium sized binary with 0.8 GiB .pseudo_probe_desc section, this saves 0.7 GiB peak RSS in perf2bolt. Test Plan: no-op for llvm-profgen, NFC for perf2bolt Reviewers: maksfb, dcci, wlei-llvm, rafaelauler, ayermolo Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/112996	2024-11-08 16:39:33 -08:00
Amir Ayupov	d936924f5e	[BOLT][NFC] Make YamlProfileToFunction a DenseMap (#108712 ) YAML function profiles have sparse function IDs, assigned from sequential function IDs from profiled binary. For example, for one large binary, YAML profile has 15K functions, but the highest ID is ~600K, close to number of functions in the profiled binary. In `matchProfileToFunction`, `YamlProfileToFunction` vector was resized to match function ID, which entails a 40X overcommit. Change the type of `YamlProfileToFunction` to DenseMap to reduce memory utilization. #99891 makes use of it for profile lookup associated with a given binary function.	2024-11-08 15:24:48 -08:00
Amir Ayupov	74e6478f81	[BOLT] Set call to continuation count in pre-aggregated profile #109683 identified an issue with pre-aggregated profile where a call to continuation fallthrough edge count is missing (profile discontinuity). This issue only affects pre-aggregated profile but not perf data since LBR stack has the necessary information to determine if the trace (fall- through) starts at call continuation, whereas pre-aggregated fallthrough lacks this information. The solution is to look at branch records in pre-aggregated profiles that correspond to returns and assign counts to call to continuation fallthrough: - BranchFrom is in another function or DSO, - BranchTo may be a call continuation site: - not an entry point/landing pad. Note that we can't directly check if BranchFrom corresponds to a return instruction if it's in external DSO. Keep call continuation handling for perf data (`getFallthroughsInTrace`) [1] as-is due to marginally better performance. The difference is that return-converted call to continuation fallthrough is slightly more frequent than other fallthroughs since the former only requires one LBR address while the latter need two that belong to the profiled binary. Hence return-converted fallthroughs have larger "weight" which affects code layout. [1] `DataAggregator::getFallthroughsInTrace` `fea18afeed/bolt/lib/Profile/DataAggregator.cpp (L906-L915)` Test Plan: added callcont-fallthru.s Reviewers: maksfb, ayermolo, ShatianWang, dcci Reviewed By: maksfb, ShatianWang Pull Request: https://github.com/llvm/llvm-project/pull/109486	2024-11-07 16:20:19 -08:00
Kazu Hirata	accd8f98be	[BOLT] Fix a warning This patch: bolt/lib/Passes/LongJmp.cpp:830:14: error: variable 'NumIterations' set but not used [-Werror,-Wunused-but-set-variable]	2024-11-07 15:09:52 -08:00
Maksim Panchenko	49ee6069db	[BOLT][AArch64] Add support for compact code model (#112110 ) Add `--compact-code-model` option that executes alternative branch relaxation with an assumption that the resulting binary has less than 128MB of code. The relaxation is done in `relaxLocalBranches()`, which operates on a function level and executes on multiple functions in parallel. Running the new option on AArch64 Clang binary produces slightly smaller code and the relaxation finishes in about 1/10th of the time. Note that the new `.text` has to be smaller than 128MB, and `.plt` has to be closer than 128MB to `.text`.	2024-11-07 14:51:12 -08:00
Jacob Bramley	16cd5cdf4d	[BOLT] Ignore AArch64 markers outside their sections. (#74106 ) AArch64 uses $d and $x symbols to delimit data embedded in code. However, sometimes we see $d symbols, typically in .eh_frame, with addresses that belong to different sections. These occasionally fall inside .text functions and cause BOLT to stop disassembling, which in turn causes DWARF CFA processing to fail. As a workaround, we just ignore symbols with addresses outside the section they belong to. This behaviour is consistent with objdump and similar tools.	2024-11-07 15:16:14 +03:00
Sergei Barannikov	eeb987f6f3	[MC] Make generated `MCInstPrinter::getMnemonic` const (NFC) (#114682 ) The value returned from the function depends only on the instruction opcode. As a drive-by, change the type of the argument to const-reference.	2024-11-03 20:37:26 +03:00
Kazu Hirata	41baa69a7e	[BOLT] Fix warnings (#114116 ) This patch fixes: bolt/lib/Core/BinaryFunction.cpp:2537:13: error: enumeration value 'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch] bolt/lib/Core/BinaryFunction.cpp:2661:13: error: enumeration value 'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch] bolt/lib/Core/BinaryFunction.cpp:2805:13: error: enumeration value 'OpNegateRAStateWithPC' not handled in switch [-Werror,-Wswitch]	2024-10-29 13:52:22 -07:00
Amir Ayupov	cafd3e10c3	[BOLT][test] Fix NFC check with pre-aggregated-perf.test (#113944 ) NFC checks have been failing starting with https://lab.llvm.org/buildbot/#/builders/92/builds/8567. NFC testing wrapper (llvm-bolt-wrapper) replaces the call of `perf2bolt` with `llvm-bolt --aggregate-only --ignore-build-id`. `show-density` is automatically enabled for perf2bolt only but not for `llvm-bolt --aggregate-only`. Add the flag to the test to work around the issue. Test Plan: ``` cd build ../llvm-project/bolt/utils/nfc-check-setup.py --switch-back --verbose bin/llvm-lit -a tools/bolt/test/X86/pre-aggregated-perf.test ```	2024-10-28 11:30:30 -07:00
Amir Ayupov	6ee5ff95ab	[BOLT] Add profile density computation Reuse the definition of profile density from llvm-profgen (#92144): - the density is computed in perf2bolt using raw samples (perf.data or pre-aggregated data), - function density is the ratio of dynamically executed function bytes to the static function size in bytes, - profile density: - functions are sorted by density in decreasing order, accumulating their respective sample counts, - profile density is the smallest density covering 99% of total sample count. In other words, BOLT binary profile density is the minimum amount of profile information per function (excluding functions in tail 1% sample count) which is sufficient to optimize the binary well. The density threshold of 60 was determined through experiments with large binaries by reducing the sample count and checking resulting profile density and performance. The threshold is conservative. perf2bolt would print the warning if the density is below the threshold and suggest to increase the sampling duration and/or frequency to reach a given density, e.g.: ``` BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples. ``` Test Plan: updated pre-aggregated-perf.test Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe, wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/101094	2024-10-24 18:30:59 -07:00
Amir Ayupov	08916cef7e	[BOLT] Set RawBranchCount in DataAggregator Align DataAggregator (Linux perf and pre-aggregated profile reader) to DataReader (fdata profile reader) behavior: set BF->RawBranchCount which is used in profile density computation (#101094). Reviewers: ayermolo, maksfb, dcci, rafaelauler, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/101093	2024-10-24 18:28:44 -07:00
Kazu Hirata	6803062eb7	[BOLT] Fix a build failure This patch fixes: bolt/lib/Core/DIEBuilder.cpp:285:40: error: too many arguments to function call, expected 2, have 3	2024-10-22 10:20:20 -07:00
Kazu Hirata	9f264e4d2f	[BOLT] Avoid repeated hash lookups (NFC) (#112822 )	2024-10-18 08:39:31 -07:00
sinan	c3bbc3a57d	[BOLT] Fix logs with no hex convension (#112650 ) Add `utohexstr` to ensure that offsets/addresses are correctly formatted as hexadecimal values.	2024-10-18 09:46:41 +08:00
Paschalis Mpeis	cb9bacf57d	[AArch64][BOLT] Ensure tentative code layout for cold BBs runs. (#96609 ) When split functions is used, BOLT may skip tentative code layout estimation in some cases, like: - when there is no profile data for some blocks (ie cold blocks) - when there are cold functions in lite mode - when skip functions is used However, when rewriting the binary we still need to compute PC-relative distances between hot and cold basic blocks. Without cold layout estimation, BOLT uses '0x0' as the address of the first cold block, leading to incorrect estimations of any PC-relative addresses. This affects large binaries as the relaxStub method expands more branches than necessary using the short-jump sequence, at it wrongly believes it has exceeded the branch distance boundary. This increases code size with both a larger and slower sequence; however, performance regression is expected to be minimal since this only affects any called cold code. Example of such an unnecessary relaxation: from: ```armasm b .Ltmp1234 ``` to: ```armasm adrp x16, .Ltmp1234 add x16, x16, :lo12:.Ltmp1234 br x16 ```	2024-10-17 08:59:05 +01:00
Amir Ayupov	3c4f00905e	[BOLT] Support perf2bolt-N in the driver Check invoked tool with `starts_with`. Addresses the issue where `perf2bolt` invoked using a distro symlink `perf2bolt-16` fails to run in perf2bolt mode and runs in llvm-bolt mode instead. The issue is mentioned in https://vondra.me/posts/playing-with-bolt-and-postgres/ Test Plan: ``` ln -sf perf2bolt perf2bolt-20 perf2bolt-20 clang -p perf.data -o fdata.clang -w yaml.clang ... PERF2BOLT: wrote 188593 objects and 0 memory objects to fdata.clang ``` Reviewers: ayermolo, rafaelauler, dcci, maksfb Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/111072	2024-10-14 10:17:31 -07:00
Kazu Hirata	23c834092e	[BOLT] Avoid repeated set lookups (NFC) (#112157 )	2024-10-14 06:55:04 -07:00
Kazu Hirata	7928e14f5e	[BOLT] Avoid repeated map lookups (NFC) (#112118 )	2024-10-12 22:06:49 -07:00
Kazu Hirata	b192f208d6	[BOLT] Avoid repeated hash lookups (NFC) (#112073 )	2024-10-12 08:03:39 -07:00
Amir Ayupov	79d695f049	[BOLT][NFCI] Speedup BAT::writeMaps For a large binary with BAT section of size 38 MB with ~170k maps, reduces writeMaps time from 70s down to 1s. The inefficiency was in the use of std::distance with std::map::iterator which doesn't provide random access. Use sorted vector for lookups. Test Plan: NFC Reviewers: maksfb, rafaelauler, dcci, ayermolo Reviewed By: maksfb Pull Request: https://github.com/llvm/llvm-project/pull/112061	2024-10-11 21:40:53 -07:00
Kazu Hirata	1be849c529	[BOLT] Avoid repeated hash lookups (NFC) (#111782 )	2024-10-09 20:19:58 -07:00
Maksim Panchenko	0e86e5214c	[BOLT][AArch64] Reduce the number of ADR relaxations (#111577 ) If ADR instruction references the same function, we can skip relaxation even if the function is split but ADR is in the main fragment.	2024-10-08 16:15:00 -07:00
ShatianWang	4cab01f072	[BOLT] Profile quality stats -- CFG discontinuity (#109683 ) In a perfect profile, each positive-execution-count block in the function’s CFG should be reachable from a positive-execution-count function entry block through a positive-execution-count path. This new pass checks how well the BOLT input profile satisfies this “CFG continuity” property. More specifically, for each of the hottest 1000 functions, the pass calculates the function’s fraction of basic block execution counts that is “unreachable”. It then reports the 95th percentile of the distribution of the 1000 unreachable fractions in a single BOLT-INFO line. The smaller the reported value is, the better the BOLT profile satisfies the CFG continuity property. The default value of 1000 above can be changed via the hidden BOLT option `-num-functions-for-continuity-check=[N]`. If more detailed stats are needed, `-v=1` can be added to the BOLT invocation: the hottest N functions will be grouped into 5 equally-sized buckets, from the hottest to the coldest; for each bucket, various summary statistics of the distribution of the fractions and the raw unreachable execution counts will be reported.	2024-10-08 19:07:43 -04:00
Tex Riddell	e237d8aac8	[BOLT] Fix tests broken by abe0dd1 (#110071 ) abe0dd195a3b2630afdc5c1c233eb2a068b2d72f (#109553) changed default llvm-objdump output for consecutive zeros. This broke two tests: BOLT :: AArch64/constant_island_pie_update.s BOLT :: AArch64/update-weak-reference-symbol.s This fixes the test failures by adding -z to llvm-objdump in RUN line.	2024-09-25 19:34:57 -07:00
Maksim Panchenko	4db0cc4c55	[BOLT] Allow sections in --print-only flag (#109622 ) While printing functions, expand --print-only flag to accept section names. E.g., "--print-only=\.init" will only print functions from ".init" section.	2024-09-25 23:44:06 +02:00

1 2 3 4 5 ...

2461 Commits