llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-30 06:16:07 +00:00

Author	SHA1	Message	Date
Maksim Panchenko	d7d564b2fc	[BOLT] Add BinaryFunction::registerBranch(). NFC (#83337 ) Add an external interface to register a branch in a function that is in disassembled state. Allows to make custom modifications to the disassembler. E.g., a pre-CFG pass can add an instruction and register a branch that will later be used during the CFG construction.	2024-02-28 20:04:28 -08:00
Maksim Panchenko	3f2a9e5910	[BOLT] Sort TakenBranches immediately before use. NFCI (#83333 ) Move code that sorts TakenBranches right before the branches are used. We can populate TakenBranches in pre-CFG post-processing and hence have to postpone the sorting to a later point in the processing pipeline. Will add such a pass later. For now it's NFC.	2024-02-28 19:51:44 -08:00
Maksim Panchenko	7c206c7812	[BOLT] Refactor interface for instruction labels. NFCI (#83209 ) To avoid accidentally setting the label twice for the same instruction, which can lead to a "lost" label, introduce getOrSetInstLabel() function. Rename existing functions to getInstLabel()/setInstLabel() to make it explicit that they operate on instruction labels. Add an assertion in setInstLabel() that the instruction did not have a prior label set.	2024-02-27 18:44:28 -08:00
Amir Ayupov	52cf07116b	[BOLT][NFC] Log through JournalingStreams (#81524 ) Make core BOLT functionality more friendly to being used as a library instead of in our standalone driver llvm-bolt. To accomplish this, we augment BinaryContext with journaling streams that are to be used by most BOLT code whenever something needs to be logged to the screen. Users of the library can decide if logs should be printed to a file, no file or to the screen, as before. To illustrate this, this patch adds a new option `--log-file` that allows the user to redirect BOLT logging to a file on disk or completely hide it by using `--log-file=/dev/null`. Future BOLT code should now use `BinaryContext::outs()` for printing important messages instead of `llvm::outs()`. A new test log.test enforces this by verifying that no strings are print to screen once the `--log-file` option is used. In previous patches we also added a new BOLTError class to report common and fatal errors, so code shouldn't call exit(1) now. To easily handle problems as before (by quitting with exit(1)), callers can now use `BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code needs to deal with BOLT errors. To test this, we have fatal.s that checks we are correctly quitting and printing a fatal error to the screen. Because this is a significant change by itself, not all code was yet ported. Code from Profiler libs (DataAggregator and friends) still print errors directly to screen. Co-authored-by: Rafael Auler <rafaelauler@fb.com> Test Plan: NFC	2024-02-12 14:53:53 -08:00
Amir Ayupov	13d60ce2f2	[BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes (2/2) (#81523 ) As part of the effort to refactor old error handling code that would directly call exit(1), in this patch continue the migration on libCore, libRewrite and libPasses to use the new BOLTError class whenever a failure occurs. Test Plan: NFC Co-authored-by: Rafael Auler <rafaelauler@fb.com>	2024-02-12 14:51:15 -08:00
Amir Ayupov	b039ccc684	[BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253 ) Provide backwards compatibility for YAML profile that uses `std::hash`: xxh3 hash is the default for newly produced profile (sets `std-hash: false`), whereas the profile that doesn't specify `std-hash` will be treated as `std-hash: true`, preserving old behavior.	2023-12-11 12:27:32 -08:00
Maksim Panchenko	4f3081296f	[BOLT][NFC] Fix comment (#73983 ) Fix off-by-one error in comment.	2023-11-30 14:31:38 -08:00
Maksim Panchenko	4bcbbe1f70	[BOLT] Refactor fixBranches() (#73752 ) Simplify code in fixBranches(). Mostly NFC, accept the x86-specific check for code fragments now takes into account presence of more than two fragments. Should only matter when we split code into multiple fragments and can run fixBranches() more than once. Also, don't replace a branch target with the same one, as such operation may allocate memory for extra MCSymbolRefExpr.	2023-11-29 16:24:16 -08:00
spupyrev	e7dd596c68	[BOLT] Use deterministic xxh3 for computing BF/BB hashes (#72542 ) std::hash and ADT/Hashing::hash_value are non-deterministic functions whose results might vary across implementation/process/execution. Using xxh3 instead for computing hashes of BinaryFunctions and BinaryBasicBlock for stale profile matching. (A possible alternative is to use ADT/StableHashing.h based on FNV hashing but xxh3 seems to be more popular in LLVM) This is to address https://github.com/llvm/llvm-project/issues/65241.	2023-11-27 14:45:46 -08:00
Maksim Panchenko	f4834255d3	[BOLT] Reset output addresses for deleted blocks (#73429 ) This is a follow-up to #73076. We need to reset output addresses for deleted blocks, otherwise the address translation may mistakenly attribute input address of a deleted block to a non-zero address. While working on a test case, I've discovered that DWARF output ranges were already broken for deleted basic blocks: #73428. I will provide a test case for this PR with a DWARF address range fix PR.	2023-11-25 23:23:47 -08:00
Maksim Panchenko	365114292a	[BOLT][NFC] Refactor function state check (#73420 ) Remove redundant check in updateOutputValues().	2023-11-25 21:09:54 -08:00
ShatianWang	d333c0e062	[BOLT] Extend calculateEmittedSize() for block size calculation (#73076 ) This commit modifies BinaryContext::calculateEmittedSize() to update the BinaryBasicBlock::OutputAddressRange of each basic block in the function in place. BinaryBasicBlock::getOutputSize() now gives the emitted size of the basic block.	2023-11-23 15:28:31 -05:00
Maksim Panchenko	f653f6d57a	[BOLT][NFC] Delete unused declarations (#72596 )	2023-11-16 23:36:19 -08:00
Vladislav Khmelevsky	5b59540661	[BOLT] Enhance fixed indirect branch handling (#71324 ) Previously HasFixedIndirectBranch was set in BF to set isSimple to false later because of unreachable bb ellimination pass which might remove the BB with it's symbols accessed by other instructions than calls. It seems to be that better solution would be to add extra entry point on target offset instead of marking BF as non-simple.	2023-11-16 09:30:55 +04:00
Maksim Panchenko	e823136d43	[BOLT] Refactor --keep-nops option. NFC. (#72228 ) Run RemoveNops pass only if --keep-nops is set to false (default).	2023-11-14 11:28:13 -08:00
Maksim Panchenko	f633f325a1	[BOLT] Fix NOP instruction emission on x86 (#72186 ) Use MCAsmBackend::writeNopData() interface to emit NOP instructions on x86. There are multiple forms of NOP instruction on x86 with different sizes. Currently, LLVM's assembly/disassembly does not support all forms correctly which can lead to a breakage of input code semantics, e.g. if the program relies on NOP instructions for reserving a patch space. Add "--keep-nops" option to preserve NOP instructions.	2023-11-13 18:12:39 -08:00
Maksim Panchenko	2db9b6a93f	[BOLT] Make instruction size a first-class annotation (#72167 ) When NOP instructions are used to reserve space in the code, e.g. for patching, it becomes critical to preserve their original size while emitting the code. On x86, we rely on "Size" annotation for NOP instructions size, as the original instruction size is lost in the disassembly/assembly process. This change makes instruction size a first-class annotation and is affectively NFCI. A follow-up diff will use the annotation for code emission.	2023-11-13 14:33:39 -08:00
Vladislav Khmelevsky	c6c04a83a7	[BOLT] Run EliminateUnreachableBlocks in parallel (#71299 ) The wall time for this pass decreased on my laptop from ~80 sec to 5 sec processing the clang.	2023-11-10 00:46:04 +04:00
Maksim Panchenko	11f52f783a	[BOLT][DWARF] Fix invalid address ranges (#71474 ) When NOP instructions are removed by BOLT and a DWARF address range falls past the removed instructions, it may lead to invalid DWARF ranges in the output binary. E.g. the range may fall outside of the basic block boundaries. This fix makes sure the modified range fits within the containing basic block. A proper fix requires tracking instructions within the block and will come in a different PR.	2023-11-09 09:55:49 -08:00
Maksim Panchenko	254ccb95e8	[BOLT] Follow-up to "Fix incorrect basic block output addresses" (#71630 ) In 8244ff6739a09cb75e6e7fd1c24b85e2b1397266, I've introduced an assertion that incorrectly used BasicBlock::empty(). Some basic blocks may contain only pseudo instructions and thus BB->empty() will evaluate to false, while the actual code size will be zero.	2023-11-08 10:53:36 -08:00
maksfb	74e0a26fd1	[BOLT] Modify MCPlus annotation internals. NFCI. (#70412 ) When annotating MCInst instructions, attach extra annotation operands directly to the annotated instruction, instead of attaching them to an instruction pointed to by a special kInst operand. With this change, it's no longer necessary to allocate MCInst and most of the first-class annotations come with free memory as currently MCInst is declared with: SmallVector<MCOperand, 10> Operands; i.e. more operands than are normally being used. We still create a kInst operand with a nullptr instruction value to designate the beginning of annotation operands. However, this special operand might not be needed if we can rely on MCInstrDesc::NumOperands.	2023-11-06 12:14:22 -08:00
maksfb	7f031d1c7c	[BOLT] Fix address mapping for ICP code (#70136 ) When we create new code for indirect code promotion optimization, we should mark it as originating from the indirect jump instruction for BOLT address translation (BAT) to map it to the original instruction.	2023-11-06 11:25:49 -08:00
maksfb	6e26246c22	[BOLT][DWARF] Refactor address ranges processing (#71225 ) Create BinaryFunction::translateInputToOutputRange() and use it for updating DWARF debug ranges and location lists while de-duplicating the existing code. Additionally, move DWARF-specific code out of BinaryFunction and add print functions to facilitate debugging. Note that this change is deliberately kept "bug-level" compatible with the existing solution to keep it NFCI and make it easier to track any possible regressions in the future updates to the ranges-handling code.	2023-11-06 11:10:20 -08:00
Vladislav Khmelevsky	838331a081	[BOLT] Set NOOP size only on X86 (NFC) (#71307 ) Small fix, we have problems with noop size only on x86, no reason to do it on other platforms.	2023-11-06 15:21:50 +04:00
maksfb	8244ff6739	[BOLT] Fix incorrect basic block output addresses (#70000 ) Some optimization passes may duplicate basic blocks and assign the same input offset to a number of different blocks in a function. This is done e.g. to correctly map debugging ranges for duplicated code. However, duplicate input offsets present a problem when we use AddressMap to generate new addresses for basic blocks. The output address is calculated based on the input offset and will be the same for blocks with identical offsets. The result is potentially incorrect debug info and BAT records. To address the issue, we have to eliminate the dependency on input offsets while generating output addresses for a basic block. Each block has a unique label, hence we extend AddressMap to include address lookup based on MCSymbol and use the new functionality to update block addresses.	2023-10-24 12:22:43 -07:00
Job Noorman	6795bfce4d	[BOLT][RISCV] Handle CIE's produced by GNU as (#69578 ) On RISC-V, GNU as produces the following initial instruction in CIE's: ``` DW_CFA_def_cfa_register: r2 ``` While I believe it is technically illegal to use this instruction without first using a `DW_CFA_def_cfa` (since the offset is undefined), both `readelf` and `llvm-dwarfdump` accept this and implicitly set the offset to 0. In BOLT, however, this triggers an assert (in `CFISnapshot::advanceTo`) as it (correctly) believes the offset is not set. This patch fixes this by setting the offset to 0 whenever executing `DW_CFA_def_cfa_register` while the offset is undefined. Note that this is probably the simplest workaround but it has a downside: while emitting CFI start, we check if the initial instructions are contained within `MCAsmInfo::getInitialFrameState` and omit them if they are. This will not be true for GNU CIE's (since they differ from LLVM's) which causes an unnecessary `DW_CFA_def_cfa_register` to be emitted. While technically correct, it would probably be better to replace the GNU CIE with the one used by LLVM to avoid this situation. This would solve the problem this patch solves while also preventing unnecessary CFI instructions. However, this is a bit trickier to implement correctly so I propose to keep this for a later time. Note on testing: the test creates a simple function with three basic blocks and forces the CFI state of the last one to be different from the others using an arbitrary CFI instruction. Then, `--reorder-blocks=reverse` is used to force `CFISnapshot::advanceTo` to be called. This causes an assert on the current main branch.	2023-10-20 15:49:17 +00:00
Vladislav Khmelevsky	b7944f7c04	[BOLT] Return proper minimal alignment from BF (#67707 ) Currently minimal alignment of function is hardcoded to 2 bytes. Add 2 more cases: 1. In case BF is data in code return the alignment of CI as minimal alignment 2. For aarch64 and riscv platforms return the minimal value of 4 (added test for aarch64) Otherwise fallback to returning the 2 as it previously was.	2023-10-12 09:33:08 +04:00
Job Noorman	ff5e2babcb	[BOLT] Improve handling of relocations targeting specific instructions (#66395 ) On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol `foo`: ``` nop 1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(1b)(t0) ``` This results in two relocation: - auipc: `R_RISCV_PCREL_HI20` referencing `foo`; - ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points to the auipc instruction. It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this. BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this: ``` .BB0: nop .BB1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(.BB1)(t0) ``` While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation points to the correct instruction), it has two downsides: - Too many basic blocks are created (the example above is logically only one yet two are created); - If instructions are inserted in `.BB1` (e.g., by instrumentation), things will break since the label will not point to the auipc anymore. This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows: - Add a new annotation type (`kLabel`) that allows us to annotate instructions with an `MCSymbol *`; - Whenever we encounter a relocation type that is used to refer to a specific instruction (`Relocation::isInstructionReference`), we register it without a symbol; - During disassembly, whenever we encounter an instruction with such a relocation, create a symbol for its target and store it in an offset to symbol map (to ensure multiple relocations referencing the same instruction use the same label); - After disassembly, iterate this map to attach labels to instructions via the new annotation type; - During emission, emit these labels right before the instruction. I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed. I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of `MCAnnotation` suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.	2023-10-06 06:46:16 +00:00
Job Noorman	7fa33773e3	[BOLT][RISCV] Handle long tail calls (#67098 ) Long tail calls use the following instruction sequence on RISC-V: ``` 1: auipc xi, %pcrel_hi(sym) jalr zero, %pcrel_lo(1b)(xi) ``` Since the second instruction in isolation looks like an indirect branch, this confused BOLT and most functions containing a long tail call got marked with "unknown control flow" and didn't get optimized as a consequence. This patch fixes this by detecting long tail call sequence in `analyzeIndirectBranch`. `FixRISCVCallsPass` also had to be updated to expand long tail calls to `PseudoTAIL` instead of `PseudoCALL`. Besides this, this patch also fixes a minor issue with compressed tail calls (`c.jr`) not being detected. Note that I had to change `BinaryFunction::postProcessIndirectBranches` slightly: the documentation of `MCPlusBuilder::analyzeIndirectBranch` mentions that the [`Begin`, `End`) range contains the instructions immediately preceding `Instruction`. However, in `postProcessIndirectBranches`, all the instructions in the BB where passed in the range. This made it difficult to find the preceding instruction so I made sure only the preceding instructions are passed.	2023-10-05 08:55:30 +00:00
zhoujiapeng	16fd879980	[BOLT] Skip the validation of CFG after it is finalized When current state is `CFG_Finalized`, function `validateCFG()` should return true directly. Reviewed By: maksfb, yota9, Kepontry Differential Revision: https://reviews.llvm.org/D159410	2023-09-17 00:07:59 +08:00
Job Noorman	475a93a07a	[BOLT] Calculate output values using BOLTLinker BOLT uses `MCAsmLayout` to calculate the output values of functions and basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue can be triggered by enabling linker relaxation on RISC-V. Since linker relaxation can remove instructions, symbol values may change. This causes, among other things, the symbol table created by BOLT in the output executable to be incorrect. This patch solves this issue by using `BOLTLinker` to get symbol values instead of `MCAsmLayout`. This way, output values are calculated based on a post-linking state. To make sure the linker can update all necessary symbols, this patch also makes sure all these symbols are not marked as temporary so that they end-up in the object file's symbol table. Note that this patch only deals with symbols of binary functions (`BinaryFunction::updateOutputValues`). The technique described above turned out to be too expensive for basic block symbols so those are handled differently in D155604. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D154604	2023-08-28 10:13:07 +02:00
Rafael Auler	b59cf211a0	[BOLT] Don't choke on injected functions' IO map AddressMap would fail lookup for injected functions and crash BOLT. Fix that. Reviewed By: #bolt, maksfb, jobnoorman Differential Revision: https://reviews.llvm.org/D158685	2023-08-24 12:02:55 -07:00
Kazu Hirata	ff22d125a7	[BOLT] Fix an unused variable warning This patch fixes: bolt/lib/Core/BinaryFunction.cpp:4117:20: error: unused variable 'FragmentBaseAddress' [-Werror,-Wunused-variable]	2023-08-21 07:57:18 -07:00
Job Noorman	23c8d38258	[BOLT] Calculate input to output address map using BOLTLinker BOLT uses MCAsmLayout to calculate the output values of basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue was first addressed in D154604 by adding all basic block symbols to the symbol table for the linker to resolve them. However, the runtime overhead of handling this huge symbol table turned out to be prohibitively large. This patch solves the issue in a different way. First, a temporary section containing [input address, output symbol] pairs is emitted to the intermediary object file. The linker will resolve all these references so we end up with a section of [input address, output address] pairs. This section is then parsed and used to: - Replace BinaryBasicBlock::OffsetTranslationTable - Replace BinaryFunction::InputOffsetToAddressMap - Update BinaryBasicBlock::OutputAddressRange Note that the reason this is more performant than the previous attempt is that these symbol references do not cause entries to be added to the symbol table. Instead, section-relative references are used for the relocations. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155604	2023-08-21 10:36:20 +02:00
Kazu Hirata	363be89c7d	[BOLT] Use static_assert (NFC)	2023-08-10 18:44:17 -07:00
Amir Ayupov	70e76e0982	[BOLT] Fix instrumenting conditional tail calls We identify instructions to be instrumented based on Offset annotation. BOLT "expands" conditional tail calls into a conditional jump to a basic block with unconditional tail call. Move Offset annotation from former CTC to the tail call. For expanded CTC we keep Offset attached to the original instruction which is converted into a regular conditional jump, while leaving the newly created tail call without an Offset annotation. This leads to attempting the instrumentation of the conditional jump which points to the basic block with an inherited input offset thus creating an invalid edge description. At the same time, the newly created tail call is skipped entirely which means we're not creating a call description for it. If we instead reassign Offset annotation from the conditional jump to the tail call we fix both issues. The conditional jump will be skipped not creating an invalid edge description, while tail call will be handled properly (unformly with regular calls). Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156389	2023-07-31 13:52:50 -07:00
spupyrev	31e8a9f4d9	[BOLT] Add stale-related logging Adding some logs related to stale profile matching. The new data can be helpful to understand how "stale" the input profile is and how well the inference is able to utilize the stale data. Example of outputs on clang-10 built with LTO (profile collected on a year-old release): ``` BOLT-INFO: inferred profile for 2101 (18.52% of profiled, 100.00% of stale) functions responsible for 30.95% samples (14754697 out of 47670654) BOLT-INFO: stale inference matched 89.42% of basic blocks (79052 out of 88402 stale) responsible for 76.99% samples (645737 out of 838719 stale) ``` LTO+AutoFDO: ``` BOLT-INFO: inferred profile for 6146 (57.57% of profiled, 100.00% of stale) functions responsible for 90.34% samples (50891403 out of 56330313) BOLT-INFO: stale inference matched 74.55% of basic blocks (191295 out of 256589 stale) responsible for 57.30% samples (1288632 out of 2248799 stale) ``` Reviewed By: Amir, maksfb Differential Revision: https://reviews.llvm.org/D154737	2023-07-27 08:56:57 -07:00
Maksim Panchenko	b6fbb64d7c	[BOLT] Fix jump table issue for split functions A jump table in a split function may contain an entry matching a start address of another fragment of the function. While converting addresses to labels, we used to ignore such entries resulting in underpopulated jump table. Change that, so we always create one label per address. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D156013	2023-07-23 08:39:48 -07:00
Maksim Panchenko	38639a8159	[BOLT][NFCI] Migrate Linux Kernel handling code to MetadataRewriter Create LinuxKernelRewriter and move kernel-specific code to this class. Depends on D154023 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154024	2023-07-06 11:25:50 -07:00
Maksim Panchenko	98e2d63027	[BOLT][NFCI] Use MetadataRewriter interface to update SDT markers Migrate SDT markers processing to the new MetadataRewriter interface. Depends on D154020 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D154021	2023-07-06 11:17:17 -07:00
Amir Ayupov	59a27170c9	[BOLT][NFC] Simplify postProcessJumpTables Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D154115	2023-06-29 22:47:16 -07:00
Job Noorman	f873029386	[BOLT] Add minimal RISC-V 64-bit support Just enough features are implemented to process a simple "hello world" executable and produce something that still runs (including libc calls). This was mainly a matter of implementing support for various relocations. Currently, the following are handled: - R_RISCV_JAL - R_RISCV_CALL - R_RISCV_CALL_PLT - R_RISCV_BRANCH - R_RISCV_RVC_BRANCH - R_RISCV_RVC_JUMP - R_RISCV_GOT_HI20 - R_RISCV_PCREL_HI20 - R_RISCV_PCREL_LO12_I - R_RISCV_RELAX - R_RISCV_NONE Executables linked with linker relaxation will probably fail to be processed. BOLT relocates .text to a high address while leaving .plt at its original (low) address. This causes PC-relative PLT calls that were relaxed to a JAL to not fit their offset in an I-immediate anymore. This is something that will be addressed in a later patch. Changes to the BOLT core are relatively minor. Two things were tricky to implement and needed slightly larger changes. I'll explain those below. The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a AUIPC/JALR pair, the second does not get any relocation (unlike other PCREL pairs). This causes issues with the combinations of the way BOLT processes binaries and the RISC-V MC-layer handles relocations: - BOLT reassembles instructions one by one and since the JALR doesn't have a relocation, it simply gets copied without modification; - Even though the MC-layer handles R_RISCV_CALL properly (adjusts both the AUIPC and the JALR), it assumes the immediates of both instructions are 0 (to be able to or-in a new value). This will most likely not be the case for the JALR that got copied over. To handle this difficulty without resorting to RISC-V-specific hacks in the BOLT core, a new binary pass was added that searches for AUIPC/JALR pairs and zeroes-out the immediate of the JALR. A second difficulty was supporting ABS symbols. As far as I can tell, ABS symbols were not handled at all, causing __global_pointer$ to break. RewriteInstance::analyzeRelocation was updated to handle these generically. Tests are provided for all supported relocations. Note that in order to test the correct handling of PLT entries, an ELF file produced by GCC had to be used. While I tried to strip the YAML representation, it's still quite large. Any suggestions on how to improve this would be appreciated. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D145687	2023-06-16 12:19:36 +02:00
Maksim Panchenko	5c4d306a10	[BOLT][NFC] Change signature of MCPlusBuilder::isUnsupportedBranch() Make MCPlusBuilder::isUnsupportedBranch() take MCInst, not opcode. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152765	2023-06-13 12:20:36 -07:00
Maksim Panchenko	43f56a2f27	[BOLT] Fix handling of code references from unmodified code In lite mode (default for X86), BOLT optimizes and relocates functions with profile. The rest of the code is preserved, but if it references relocated code such references have to be updated. The update is handled by scanExternalRefs() function. Note that we cannot solely rely on relocations written by the linker, as not all code references are exposed to the linker. Additionally, the linker can modify certain instructions and relocations will no longer match the code. With this change, start using symbolic disassembler for scanning code for references in scanExternalRefs(). Unlike the previous approach, the symbolizer properly detects and creates references for instructions with multiple/ambiguous symbolic operands and handles cases where a relocation doesn't match any operand. See test cases for examples. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152631	2023-06-12 10:46:51 -07:00
spupyrev	2316a10fe5	[BOLT] stale profile matching [part 2 out of 2] This is a first "serious" version of stale profile matching in BOLT. This diff extends the hash computation for basic blocks so that we can apply a fuzzy hash-based matching. The idea is to compute several "versions" of a hash value for a basic block. A loose version of a hash (computed by ignoring instruction operands) allows to match blocks in functions whose content has been changed, while stricter hash values (considering instruction opcodes with operands and even based on hashes of block's successors/predecessors) allow to resolve collisions. In order to save space and build time, individual hash components are blended into a single uint64_t. There are likely numerous ways of improving hash computation but already this simple variant provides significant perf benefits. Perf testing on the clang binary: collecting data on clang-10 and using it to optimize clang-11 (with ~1 year of commits in between). Next, we compare - //stale_clang// (clang-11 optimized with profile collected on clang-10 with infer-stale-profile=0) - //opt_clang// (clang-11 optimized with profile collected on clang-11) - //infer_clang// (clang-11 optimized with profile collected on clang-10 with infer-stale-profile=1) `LTO-only` mode: //stale_clang// vs //opt_clang//: task-clock [delta(%): 9.4252 ± 1.6582, p-value: 0.000002] (That is, there is a ~9.5% perf regression) //infer_clang// vs //opt_clang//: task-clock [delta(%): 2.1834 ± 1.8158, p-value: 0.040702] (That is, the regression is reduced to ~2%) Related BOLT logs: ``` BOLT-INFO: identified 2114 (18.61%) stale functions responsible for 30.96% samples BOLT-INFO: inferred profile for 2101 (18.52% of all profiled) functions responsible for 30.95% samples ``` `LTO+AutoFDO` mode: //stale_clang// vs //opt_clang//: task-clock [delta(%): 19.1293 ± 1.4131, p-value: 0.000002] //infer_clang// vs //opt_clang//: task-clock [delta(%): 7.4364 ± 1.3343, p-value: 0.000002] Related BOLT logs: ``` BOLT-INFO: identified 5452 (50.27%) stale functions responsible for 85.34% samples BOLT-INFO: inferred profile for 5442 (50.23% of all profiled) functions responsible for 85.33% samples ``` Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D146661	2023-06-08 14:42:41 -07:00
Amir Ayupov	713b28532e	[BOLT][NFC] Fix debug messages Fix debug printing, making it easier to compare two debug logs side by side: - `BinaryFunction::addRelocation`: print function name instead of `this` ptr, - `DataAggregator::doTrace`: remove duplicated function name. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D152314	2023-06-06 15:50:58 -07:00
spupyrev	3e3a926be8	[BOLT][NFC] Add hash computation for basic blocks Extending yaml profile format with block hashes, which are used for stale profile matching. To avoid duplication of the code, created a new class with a collection of utilities for computing hashes. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D144306	2023-05-02 14:03:47 -07:00
Rafael Auler	b87bf74428	[BOLT] Fix creation of invalid CFG in presence of dead code When there is a direct jump right after an indirect one, in the absence of code jumpting to this direct jump, this is obviously dead code. However, BOLT was failing to recognize that by mistakenly placing both jmp instructions in the same basic block, and creating wrong successor edges. Fix that, so we can safely run UCE on that. This bug also causes validateCFG to fail and BOLT to crash if it is running ICP on that function. Reviewed By: #bolt, Amir Differential Revision: https://reviews.llvm.org/D148055	2023-04-11 17:19:39 -07:00
Alexis Engelke	0c049ea60a	[MC] Always encode instruction into SmallVector All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream to encode the instruction into a SmallVector. The raw_ostream however incurs some overhead for the actual encoding. This change allows an MCCodeEmitter to directly emit an instruction into a SmallVector without using a raw_ostream and therefore allow for performance improvments in encoding. A default path that uses existing raw_ostream implementations is provided. Reviewed By: MaskRay, Amir Differential Revision: https://reviews.llvm.org/D145791	2023-04-06 16:21:49 +02:00
spupyrev	92758a99c3	[BOLT] computing raw branch count for yaml profiles `Function.RawBranchCount` is initialized for fdata profile but not for yaml one. The diff adds the computation of the field for yaml profiles Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D144211	2023-03-28 11:09:21 -07:00

1 2 3

132 Commits