llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-16 20:36:35 +00:00

Author	SHA1	Message	Date
Amir Ayupov	fa4ac19f0f	[BOLT] Accept PLT fall-throughs as valid traces (#129481 ) We used to report PLT traces as invalid (mismatching disassembled function contents) because PLT functions are marked as pseudo and ignored, thus missing CFG. However, such traces are not mismatching the function contents. Accept them without attaching the profile. Test Plan: updated callcont-fallthru.s	2025-04-11 21:26:19 -07:00
Amir Ayupov	ba93fe97c2	[BOLT][NFC] Simplify getOrCreate/analyze/populate/emitJumpTable (#132108 )	2025-04-10 21:17:04 -07:00
Maksim Panchenko	b2d272ccfb	[BOLT][X86] Fix getTargetSymbol() (#133834 ) In 96e5ee2, I inadvertently broke the way non-trivial symbol references got updated from non-optimized code. The breakage was a consequence of `getTargetSymbol(MCExpr *)` not returning a symbol when the parameter was a binary expression. Fix `getTargetSymbol()` to cover such cases.	2025-03-31 18:31:33 -07:00
Ash Dobrescu	3bba268013	[BOLT] Support computed goto and allow map addrs inside functions (#120267 ) Create entry points for addresses referenced by dynamic relocations and allow getNewFunctionOrDataAddress to map addrs inside functions. By adding addresses referenced by dynamic relocations as entry points. This patch fixes an issue where bolt fails on code using computing goto's. This also fixes a mapping issue with the bugfix from this PR: https://github.com/llvm/llvm-project/pull/117766.	2025-03-19 14:55:59 +00:00
Kazu Hirata	03614b9a8a	[BOLT] Workaround failures (#131245 ) These tests have been failing since: commit 1cfca53b9f2eadbf864b85995ec7f819d7f29b5e Author: Arthur Eubanks <aeubanks@google.com> Date: Wed Mar 12 16:20:13 2025 -0700 This patch works around the failures by removing some FileCheck directives. Hopefully, BOLT folks can chime in and commit a right fix.	2025-03-13 20:55:43 -07:00
Fangrui Song	74638f1634	[test] Replace .data.rel.ro with .section .data.rel.ro,"aw" to avoid using the extension unsupported by gas.	2025-03-01 20:55:17 -08:00
ShatianWang	7e33bebe7c	[BOLT] Report flow conservation scores (#127954 ) Add two additional profile quality stats for CG (call graph) and CFG (control flow graph) flow conservations besides the CFG discontinuity stats introduced in #109683. The two new stats quantify how different "in-flow" is from "out-flow" in the following cases where they should be equal. The smaller the reported stats, the better the flow conservations are. CG flow conservation: for each function that is not a program entry, the number of times the function is called according to CG ("in-flow") should be equal to the number of times the transition from an entry basic block of the function to another basic block within the function is recorded ("out-flow"). CFG flow conservation: for each basic block that is not a function entry or exit, the number of times the transition into this basic block from another basic block within the function is recorded ("in-flow") should be equal to the number of times the transition from this basic block to another basic block within the function is recorded ("out-flow"). Use `-v=1` for more detailed bucketed stats, and use `-v=2` to dump functions / basic blocks with bad flow conservations.	2025-02-28 11:06:52 -05:00
Amir Ayupov	f567524399	[BOLT] Fix doTrace in BAT mode (#128546 ) When processing BOLTed binaries with BAT section, we used to indiscriminately use `BAT->getFallthroughsInTrace` to record fall-throughs, even if the function is not covered by BAT. Fix that by using non-BAT CFG-based `getFallthroughsInTrace` if the function is not in BAT. Test Plan: updated bolt-address-translation-yaml.test	2025-02-25 10:56:13 -08:00
Amir Ayupov	3968ebd00d	[BOLT] Keep multi-entry functions simple in aggregation mode (#128253 ) BOLT used to mark multi-entry functions non-simple in non-relocation mode with the reasoning that we can't move them due to potentially undetected references. However, in aggregation mode it doesn't apply as BOLT doesn't perform optimizations. Relax this constraint in case of an aggregation job. Test Plan: added entry-point-fallthru.s	2025-02-25 10:53:45 -08:00
Amir Ayupov	209252f3d5	[BOLT] Introduce skip-inline flag (#128135 ) Introduce exclusion list for inlining, allowing more fine-grained control than using skip-funcs. Test Plan: added skip-inline.s	2025-02-21 09:10:53 -08:00
Amir Ayupov	61acfb07e8	[BOLT] Add pre-aggregated trace support (#127125 ) Traces are triplets of branch source, target, and fall-through end (next branch). Traces simplify differentiation of fall-throughs into local- and external-origin, which improves performance over profile with undifferentiated fall-throughs by eliminating profile discontinuity in call to continuation fall-throughs. This makes it possible to avoid converting return profile into call to continuation profile which may introduce statistical biases. The existing format makes provisions for local- (F) and external- (f) origin fall-throughs, but the profile producer needs to know function boundaries. BOLT has that information readily available, so providing the origin branch of a fall-through is a functional replacement of the fall-through kind (f or F). This also has an effect of combining branches and fall-throughs into a single record. As traces subsume other pre-aggregated profile kinds, BOLT may drop support for them soon. Users of pre-aggregated profile format are advised to migrate to the trace format. Test Plan: Updated callcont-fallthru.s	2025-02-13 15:14:56 -08:00
Fangrui Song	a907008bcb	[BOLT,test] Link against a shared object to test PLT (#125625 ) A few tests generate a statically-linked position-independent executable with `-nostdlib -Wl,--unresolved-symbols=ignore-all -pie` (`%clang`) and test PLT handling. (--unresolved-symbols=ignore-all suppresses undefined symbol errors and serves as a convenience hack.) This relies on an unguaranteed linker behavior: a statically-linked PIE does not necessarily generate PLT entries. While current lld generates a PLT entry, it will change to suppress the PLT entry to simplify internal handling and improve consistency. (The behavior has no consistency in GNU ld, some ports generated a .dynsym entry while some don't. While most seem to generate a PLT entry but some ports use a weird `R_*_NONE` relocation.)	2025-02-05 09:31:58 -08:00
YongKang Zhu	e6d12ad791	[BOLT][NFC] Fix test X86/dynamic-relocs-on-entry.s (#125264 )	2025-01-31 13:19:49 -08:00
Franklin	6e8a1a45a7	[BOLT] Detect Linux kernel version if the binary is a Linux kernel (#119088 ) This makes it easier to handle differences (e.g. of exception table entry size) between versions of Linux kernel	2024-12-26 09:54:23 -08:00
Alexander Yermolovich	3c357a49d6	[BOLT] Add support for safe-icf (#116275 ) Identical Code Folding (ICF) folds functions that are identical into one function, and updates symbol addresses to the new address. This reduces the size of a binary, but can lead to problems. For example when function pointers are compared. This can be done either explicitly in the code or generated IR by optimization passes like Indirect Call Promotion (ICP). After ICF what used to be two different addresses become the same address. This can lead to a different code path being taken. This is where safe ICF comes in. Linker (LLD) does it using address significant section generated by clang. If symbol is in it, or an object doesn't have this section symbols are not folded. BOLT does not have the information regarding which objects do not have this section, so can't re-use this mechanism. This implementation scans code section and conservatively marks functions symbols as unsafe. It treats symbols as unsafe if they are used in non-control flow instruction. It also scans through the data relocation sections and does the same for relocations that reference a function symbol. The latter handles the case when function pointer is stored in a local or global variable, etc. If a relocation address points within a vtable these symbols are skipped.	2024-12-16 21:49:53 -08:00
Alexander Yermolovich	0a7e048667	[BOLT][DWARF][NFC] Minimize dwarf5-debug-names-gnu-push-tls-address.s (#120103 ) Removed unnecessary parts from the .text section.	2024-12-16 09:54:00 -08:00
Maksim Panchenko	f86f4574bb	[BOLT][Linux] Fix static keys test case (#119771 ) The key address in the static keys jump table was incorrectly encoded as an absolute value instead of PC-relative causing incorrect interpretation of the "likely" property of the key.	2024-12-15 17:13:04 -08:00
Alexander Yermolovich	331c2dd8b4	[BOLT][DWARF] Add support for DW_OP_GNU_push_tls_address to .debug_names (#119939 ) Added support to BOLT for DW_OP_GNU_push_tls_address. So now DW_TAG_variable with this OP in DW_AT_location will appear in debug names acceleration table. Although not in the DWARF 5 spec it is similar to DW_OP_form_tls_address. Without this support llvm-dwarfdump --verify --debug-names will report errors.	2024-12-14 09:30:25 -08:00
Alexander Yermolovich	4b825c7417	[BOLT][DWARF] Add support for transitive DW_AT_name/DW_AT_linkage_name resolution for DW_AT_name/DW_AT_linkage_name. (#119493 ) This fix handles a case where a DIE that does not have DW_AT_name/DW_AT_linkage_name, but has a reference to another DIE using DW_AT_abstract_origin/DW_AT_specification. It also fixes a bug where there are cross CU references for those attributes. Previously it would use a DWARF Unit of a DIE which was being processed The warf5-debug-names-cross-cu.s test just happened to work because how it was constructed where string section was shared by both DWARF Units. To resolve DW_AT_name/DW_AT_linkage_name this patch iterates over references until it either reaches the final DIE or finds both of those names.	2024-12-11 14:27:56 -08:00
Alexander Yermolovich	50c0e679b9	[BOLT][DWARF] Add support for DW_TAG_union_type to DebugNames. (#119023 ) Adding support for DW_TAG_union_type for DebugNames acceleration tables.	2024-12-06 15:45:52 -08:00
Raul Tambre	003b48e0cb	[BOLT][test] enable GNU extensions, use C++ compiler, remove unnecessary target (#117043 ) 1. With a Clang that doesn't default to GNU extensions they need to be enabled explicitly. 2. The X86 directory lit config sets it already, there's no reason for this test to do it by itself. 3. The C frontend executable will fail if there's for example a Clang resource file for the C++ mode that sets C++-specific options: ``` + /home/tambre/dev/llvm/build/bin/clang --target=x86_64-unknown-linux-gnu -fPIE -fuse-ld=lld -Wl,--unresolved-symbols=ignore-all -pie -fPIC -shared /home/tambre/dev/llvm/bolt/test/R_ABS.pic.lld.cpp -o /home/tambre/dev/llvm/build/tools/bolt/test/Output/R_ABS.pic.lld.cpp.tmp.so -Wl,-q -fuse-ld=lld clang: warning: argument unused during compilation: '-pie' [-Wunused-command-line-argument] error: invalid argument '-std=c23' not allowed with 'C++' ```	2024-11-27 00:14:00 +02:00
Maksim Panchenko	92301180f7	[BOLT] Use compact EH format for fixed-address executables (#117274 ) Use ULEB128 format for emitting LSDAs for fixed-address executables, similar to what we use for PIEs/DSOs. Main difference is that we don't use landing pad trampolines when landing pads are not contained in a single fragment. Instead, we fallback to emitting larger fixed-address LSDAs, which is still better than adding trampoline instructions.	2024-11-22 00:28:55 -08:00
Maksim Panchenko	105ecd8bb2	[BOLT] Avoid EH trampolines for PIEs/DSOs (#117106 ) We used to emit EH trampolines for PIE/DSO whenever a function fragment contained a landing pad outside of it. However, it is common to have all landing pads in a cold fragment even when their throwers are in a hot one. To reduce the number of trampolines, analyze landing pads for any given function fragment, and if they all belong to the same (possibly different) fragment, designate that fragment as a landing pad fragment for the "thrower" fragment. Later, emit landing pad fragment symbol as an LPStart for the thrower LSDA.	2024-11-21 18:18:30 -08:00
Shaw Young	9a9af0a23f	[BOLT] Match blocks with pseudo probes (#99891 ) Match inline trees first between profile and the binary: by GUID, checksum, parent, and inline site for inlined functions. Map profile probes to binary probes via matched inline tree nodes. Each binary probe has an associated binary basic block. If all probes from one profile basic block map to the same binary basic block, it’s an exact match, otherwise the block is determined by majority vote and reported as loose match. Pseudo probe matching happens between exact hash matching and call/loose matching. Introduce ProbeMatchSpec - a mechanism to match probes belonging to another binary function. For example, given functions foo and bar: ``` void foo() { bar(); } ``` profiled binary: bar is not inlined => have top-level function bar new binary where the profile is applied to: bar is inlined into foo. Currently, BOLT does 1:1 matching between profile functions and binary functions based on the name. #100446 will extend this to N:M where multiple profiles can be matched to one binary function (as in the example above where binary function foo would use profiles for foo and bar), and one profile can be matched to multiple binary functions (e.g. if bar was inlined into multiple functions). In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile (existing name-based matching). Test Plan: Added match-blocks-with-pseudo-probes.test Performance test: - Setup: - Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO (#79942) - BOLT fresh: BOLTed Clang using fresh profile, - BOLT stale (hash): BOLTed Clang using stale profile (collected on Clang 10K commits back), `-infer-stale-profile` (hash+call block matching) - BOLT stale (+probe): BOLTed Clang using stale profile, `-infer-stale-profile` with `-stale-matching-with-pseudo-probes` (hash+call+pseudo probe block matching) - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for build, - BOLT profiles are collected on Clang compiling large preprocessed C++ file. - Benchmark: building Clang (average of 5 runs), see driver in aaupov/llvm-devmtg-2022 - Results, wall time, lower is better: - Baseline no-BOLT: 429.52 +- 2.61s, - BOLT stale (hash): 413.21 +- 2.19s, - BOLT stale (+probe): 409.69 +- 1.41s, - BOLT fresh: 384.50 +- 1.80s. --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>	2024-11-12 07:21:03 -08:00
Amir Ayupov	74e6478f81	[BOLT] Set call to continuation count in pre-aggregated profile #109683 identified an issue with pre-aggregated profile where a call to continuation fallthrough edge count is missing (profile discontinuity). This issue only affects pre-aggregated profile but not perf data since LBR stack has the necessary information to determine if the trace (fall- through) starts at call continuation, whereas pre-aggregated fallthrough lacks this information. The solution is to look at branch records in pre-aggregated profiles that correspond to returns and assign counts to call to continuation fallthrough: - BranchFrom is in another function or DSO, - BranchTo may be a call continuation site: - not an entry point/landing pad. Note that we can't directly check if BranchFrom corresponds to a return instruction if it's in external DSO. Keep call continuation handling for perf data (`getFallthroughsInTrace`) [1] as-is due to marginally better performance. The difference is that return-converted call to continuation fallthrough is slightly more frequent than other fallthroughs since the former only requires one LBR address while the latter need two that belong to the profiled binary. Hence return-converted fallthroughs have larger "weight" which affects code layout. [1] `DataAggregator::getFallthroughsInTrace` `fea18afeed/bolt/lib/Profile/DataAggregator.cpp (L906-L915)` Test Plan: added callcont-fallthru.s Reviewers: maksfb, ayermolo, ShatianWang, dcci Reviewed By: maksfb, ShatianWang Pull Request: https://github.com/llvm/llvm-project/pull/109486	2024-11-07 16:20:19 -08:00
Amir Ayupov	cafd3e10c3	[BOLT][test] Fix NFC check with pre-aggregated-perf.test (#113944 ) NFC checks have been failing starting with https://lab.llvm.org/buildbot/#/builders/92/builds/8567. NFC testing wrapper (llvm-bolt-wrapper) replaces the call of `perf2bolt` with `llvm-bolt --aggregate-only --ignore-build-id`. `show-density` is automatically enabled for perf2bolt only but not for `llvm-bolt --aggregate-only`. Add the flag to the test to work around the issue. Test Plan: ``` cd build ../llvm-project/bolt/utils/nfc-check-setup.py --switch-back --verbose bin/llvm-lit -a tools/bolt/test/X86/pre-aggregated-perf.test ```	2024-10-28 11:30:30 -07:00
Amir Ayupov	6ee5ff95ab	[BOLT] Add profile density computation Reuse the definition of profile density from llvm-profgen (#92144): - the density is computed in perf2bolt using raw samples (perf.data or pre-aggregated data), - function density is the ratio of dynamically executed function bytes to the static function size in bytes, - profile density: - functions are sorted by density in decreasing order, accumulating their respective sample counts, - profile density is the smallest density covering 99% of total sample count. In other words, BOLT binary profile density is the minimum amount of profile information per function (excluding functions in tail 1% sample count) which is sufficient to optimize the binary well. The density threshold of 60 was determined through experiments with large binaries by reducing the sample count and checking resulting profile density and performance. The threshold is conservative. perf2bolt would print the warning if the density is below the threshold and suggest to increase the sampling duration and/or frequency to reach a given density, e.g.: ``` BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples. ``` Test Plan: updated pre-aggregated-perf.test Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe, wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/101094	2024-10-24 18:30:59 -07:00
ShatianWang	4cab01f072	[BOLT] Profile quality stats -- CFG discontinuity (#109683 ) In a perfect profile, each positive-execution-count block in the function’s CFG should be reachable from a positive-execution-count function entry block through a positive-execution-count path. This new pass checks how well the BOLT input profile satisfies this “CFG continuity” property. More specifically, for each of the hottest 1000 functions, the pass calculates the function’s fraction of basic block execution counts that is “unreachable”. It then reports the 95th percentile of the distribution of the 1000 unreachable fractions in a single BOLT-INFO line. The smaller the reported value is, the better the BOLT profile satisfies the CFG continuity property. The default value of 1000 above can be changed via the hidden BOLT option `-num-functions-for-continuity-check=[N]`. If more detailed stats are needed, `-v=1` can be added to the BOLT invocation: the hottest N functions will be grouped into 5 equally-sized buckets, from the hottest to the coldest; for each bucket, various summary statistics of the distribution of the fractions and the raw unreachable execution counts will be reported.	2024-10-08 19:07:43 -04:00
Maksim Panchenko	4db0cc4c55	[BOLT] Allow sections in --print-only flag (#109622 ) While printing functions, expand --print-only flag to accept section names. E.g., "--print-only=\.init" will only print functions from ".init" section.	2024-09-25 23:44:06 +02:00
Amir Ayupov	300051159b	[BOLT][test] Update log.test and perf_test Address noisy tests by: - perf_test: bumping sampling frequency to maximum, - log.test: matching Binary Function "main"	2024-09-23 15:47:19 -07:00
sinan	31ac3d092b	[BOLT] Add .iplt support to x86 (#106513 ) Add X86 support for parsing .iplt section and symbols.	2024-09-23 18:22:43 +08:00
Amir Ayupov	c00c62c113	[BOLT] Add pseudo probe inline tree to YAML profile Add probe inline tree information to YAML profile, at function level: - function GUID, - checksum, - parent node id, - call site in the parent. This information is used for pseudo probe block matching (#99891). The encoding adds/changes probe information in multiple levels of YAML profile: - BinaryProfile: add pseudo_probe_desc with GUIDs and Hashes, which permits deduplication of data: - many GUIDs are duplicate as the same callee is commonly inlined into multiple callers, - hashes are also very repetitive, especially for functions with low block counts. - FunctionProfile: add inline tree (see above). Top-level function is included as root of function inline tree, which makes guid and pseudo_probe_desc_hash fields redundant. - BlockProfile: densely-encoded block probe information: - probes reference their containing inline tree node, - separate lists for block, call, indirect call probes, - block probe encoding is specialized: ids are encoded as bitset in uint64_t. If only block probe with id=1 is present, it's encoded as implicit entry (id=0, omitted). - inline tree nodes with identical probes share probe description where node indices are combined into a list. On top of #107970, profile with new probe encoding has the following characteristics (profile for a large binary): - Profile without probe information: 33MB, 3.8MB compressed (baseline). - Profile with inline tree information: 92MB, 14MB compressed. Profile processing time (YAML parsing, inference, attaching steps): - profile without pseudo probes: 5s, - profile with pseudo probes, without pseudo probe matching: 11s, - with pseudo probe matching: 12.5s. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: wlei-llvm, ayermolo, rafaelauler, dcci, maksfb Reviewed By: wlei-llvm, rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/107137	2024-09-12 20:51:35 -07:00
Amir Ayupov	c820bd3e33	[BOLT][NFC] Rename profile-use-pseudo-probes The flag currently controls writing of probe information in YAML profile. #99891 adds a separate flag to use probe information for stale profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer and `profile-write-pseudo-probes` better captures the intent. Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/106364	2024-09-11 16:27:33 -07:00
Amir Ayupov	15fa3ba547	[BOLT][YAML] Allow unknown keys in the input (#100824 ) This ensures forward compatibility, where old BOLT versions can consume the profile created by newer versions with extra keys. Test Plan: added yaml-unknown-keys.test	2024-09-03 11:27:57 -07:00
Harini0924	7f3793207b	[BOLT][test] Removed the use of parentheses in BOLT tests with lit internal shell (#105720 ) This patch addresses compatibility issues with the lit internal shell by removing the use of subshell execution (parentheses and subshell syntax) in the `BOLT` tests. The lit internal shell does not support parentheses, so the tests have been refactored to use separate command invocations, with outputs redirected to temporary files where necessary. This change is relevant for enabling the lit internal shell by default, as outlined in [[RFC] Enabling the Lit Internal Shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179) fixes: #102401	2024-08-23 08:20:11 -07:00
ShatianWang	cbd302410e	[BOLT] Improve BinaryFunction::inferFallThroughCounts() (#105450 ) This PR improves how basic block execution count is updated when using the BOLT option `-infer-fall-throughs`. Previously, if a 0-count fall-through edge is assigned a positive inferred count N, then the successor block's execution count will be incremented by N. Since the successor's execution count is calculated using information besides inflow sum (such as outflow sum), it likely is already correct, and incrementing it by an additional N would be wrong. This PR improves how the successor's execution count is updated by using the max over its current count and N.	2024-08-21 00:35:07 -04:00
Connie	887f7002b6	[NFC][bolt][test] Change '\|&' to '2>&1 \|' for lit internal shell support (#102402 ) This patches changes all references to '\|&' in bolt tests to instead use the '2>&1 \|' syntax for better consistency across testing and so that lit's internal shell can be used to run these tests. This addresses a suggestion made in the comments of this RFC: https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179. Fixes https://github.com/llvm/llvm-project/issues/102388	2024-08-12 17:18:17 -07:00
Sayhaan Siddiqui	6aad62cf5b	[BOLT][DWARF] Add parallelization for processing of DWO debug information (#100282 ) Enables parallelization for the processing of DWO CUs.	2024-08-08 16:41:51 -07:00
Davide Italiano	e49549ff19	Revert "[BOLT] Abort on out-of-section symbols in GOT (#100801 )" This reverts commit a4900f0d936f0e86bbd04bd9de4291e1795f1768.	2024-08-07 20:52:19 -07:00
Vladislav Khmelevsky	a4900f0d93	[BOLT] Abort on out-of-section symbols in GOT (#100801 ) This patch aborts BOLT execution if it finds out-of-section (section end) symbol in GOT table. In order to handle such situations properly in future, we would need to have an arch-dependent way to analyze relocations or its sequences, e.g., for ARM it would probably be ADRP + LDR analysis in order to get GOT entry address. Currently, it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems to be only? related to static binaries. For the most part, it seems that it should be handled on the linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols, which eliminates the problem. Anyway, in order to achieve detection of such cases, this patch fixes a few things in BOLT: 1. For the end symbols, we're now using the section provided by ELF binary. Previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration we would only add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top-level symbols if they are located in zero-sized sections. For the most part, such things could only be found in tests, but I don't see a reason not to handle such cases. 4. Updated section-end-sym test and removed x86_64 requirement since there is no reason for this (tested on aarch64 linux) The test was provided by peterwaller-arm (thank you) in #100096 and slightly modified by me.	2024-08-07 16:26:12 +04:00
Vladislav Khmelevsky	097ddd3565	[BOLT] Fix relocations handling (#100890 ) After porting BOLT to RISCV some of the relocations were broken on both AArch64 and X86. On AArch64 the example of broken relocations would be GOT, during handling them, we should replace the symbol to __BOLT_got_zero in order to address GOT entry, not the symbol that addresses this entry. This is done further in code, so it is too early to add rel here. On X86 it is a mistake to add relocations without addend. This is the exact problem that is raised on #97937. Due to different code generation I had to use gcc-generated yaml test, since with clang I wasn't able to reproduce problem. Added tests for both architectures and made the problematic condition riscV-specific.	2024-08-07 16:25:46 +04:00
sinan	734c0488b6	[BOLT] Support map other function entry address (#101466 ) Allow BOLT to map the old address to a new binary address if the old address is the entry of the function.	2024-08-07 15:57:25 +08:00
Sayhaan Siddiqui	33960ce5a8	[BOLT][DWARF] Sort GDBIndexTUEntryVector (#101264 ) Sorts GDBIndexTUEntryVector in decreasing order by hash to ensure determinism when parallelized.	2024-07-31 11:35:38 -07:00
Sayhaan Siddiqui	79dcd93b70	[BOLT][DWARF] Remove option to write to DWP (#100771 ) Remove the --write-dwp option as well as related code and tests.	2024-07-30 16:58:01 -07:00
Sayhaan Siddiqui	9a3e66e314	[BOLT][DWARF][NFC] Fix DebugStrOffsetsWriter (#100672 ) Fix DebugStrOffsetsWriter so updateAddressMap can't be called after it is finalized.	2024-07-26 18:58:25 -07:00
Amir Ayupov	4d19676de4	[BOLT] Add profile-use-pseudo-probes option Move pseudo probe profile generation under --profile-use-pseudo-probes option. Note that updating pseudo probes is independent from this flag. Test Plan: updated pseudoprobe-decoding-inline.test Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/100299	2024-07-24 07:31:01 -07:00
Amir Ayupov	9d2dd009b6	[BOLT] Support more than two jump table parents Multi-way splitting can cause multiple fragments to access the same jump table. Relax the assumption that a jump table can only have up to two parents. Test Plan: added bolt/test/X86/three-way-split-jt.s Reviewers: ayermolo, dcci, rafaelauler, maksfb Reviewed By: rafaelauler, dcci Pull Request: https://github.com/llvm/llvm-project/pull/99988	2024-07-24 07:16:39 -07:00
Sayhaan Siddiqui	7cd7a1eab4	[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99957 ) Split processUnitDIE into two lambdas to separate the processing of DWO CUs and CUs in the main binary.	2024-07-23 12:59:40 -07:00
klensy	1ee8238f0e	[BOLT][test] Fix Filecheck typos (#93979 ) Fixes few FileCheck typos in tests and add missing(?) filecheck call in test. Co-authored-by: klensy <nightouser@gmail.com>	2024-07-19 16:57:14 -07:00
Shaw Young	296a956369	[BOLT] Match functions with call graph (#98125 ) Implemented call graph function matching. First, two call graphs are constructed for both profiled and binary functions. Then functions are hashed based on the names of their callee/caller functions. Finally, functions are matched based on these neighbor hashes and the longest common prefix of their names. The `match-with-call-graph` flag turns this matching on. Test Plan: Added match-with-call-graph.test. Matched 164 functions in a large binary with 10171 profiled functions.	2024-07-19 14:00:28 -07:00

1 2 3 4 5 ...

460 Commits