957 Commits

Author SHA1 Message Date
Vladislav Khmelevsky
bb46c72121 release/19.x: [BOLT] Fix relocations handling
Backport 097ddd3565
2024-08-20 09:17:10 +02:00
sinan
2e0782c4db [BOLT] Skip PLT search for zero-value weak reference symbols (#69136)
Take a common weak reference pattern for example
```
    __attribute__((weak)) void undef_weak_fun();

      if (&undef_weak_fun)
        undef_weak_fun();
```

In this case, an undefined weak symbol `undef_weak_fun` has an address
of zero, and Bolt incorrectly changes the relocation for the
corresponding symbol to symbol@PLT, leading to incorrect runtime
behavior.

(cherry picked from commit 6c8933e1a095028d648a5a26aecee0f569304dd0)
2024-08-10 12:06:43 +02:00
sinan
a0f4170ab8 [BOLT] Support map other function entry address (#101466)
Allow BOLT to map the old address to a new binary address if the old
address is the entry of the function.

(cherry picked from commit 734c0488b6e69300adaf568f880f40b113ae02ca)
2024-08-10 12:05:50 +02:00
Jordan Brantner
d251a328b8
[BOLT] Fix typo from alterantive to alternative (#99704)
Fix typo from `alterantive` -> `alternative`

Signed-off-by: Jordan Brantner <brantnej@oregonstate.edu>
2024-07-22 18:35:20 -07:00
Sayhaan Siddiqui
bdee9b05de
Revert "[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas" (#99904)
Reverts llvm/llvm-project#99225
2024-07-22 12:31:51 -07:00
Fangrui Song
867faeec05 [MC] Migrate to createAsmStreamer without unused bool parameters
In bolt/lib/Passes/AsmDump.cpp, the MCInstPrinter is created with false
AsmVerbose. The AsmVerbose argument to createAsmStreamer is unused.

Deprecate the legacy Target::createAsmStreamer overload, which might be
used by downstream.
2024-07-21 09:44:16 -07:00
Fangrui Song
86e21e1af2 [BOLT] Remove unused bool arguments from createMCObjectStreamer callers 2024-07-20 21:30:49 -07:00
Sayhaan Siddiqui
6747f12931
[BOLT][DWARF][NFC] Split processUnitDIE into two lambdas (#99225)
Split processUnitDIE into two lambdas to separate the processing of DWO
CUs and CUs in the main binary.
2024-07-19 17:52:49 -07:00
Daniel Hill
b686600a57
[BOLT] Skip instruction shortening (#93032)
Add the ability to disable the instruction shortening pass through
--shorten-instructions=false
2024-07-19 16:52:01 -07:00
Sayhaan Siddiqui
d54ec64f67
[BOLT][DWARF] Remove deprecated opt (#99575)
Remove deprecated DeterministicDebugInfo option and its uses.
2024-07-19 14:03:50 -07:00
Shaw Young
296a956369
[BOLT] Match functions with call graph (#98125)
Implemented call graph function matching. First, two call graphs are
constructed for both profiled and binary functions. Then functions are
hashed based on the names of their callee/caller functions. Finally,
functions are matched based on these neighbor hashes and the 
longest common prefix of their names. The `match-with-call-graph` 
flag turns this matching on.

Test Plan: Added match-with-call-graph.test. Matched 164 functions 
in a large binary with 10171 profiled functions.
2024-07-19 14:00:28 -07:00
Amir Ayupov
79a0b66593 [BOLT] Add MC dependency for Profile 2024-07-18 21:47:58 -07:00
Amir Ayupov
c905db67a0
[BOLT] Attach pseudo probes to blocks in YAML profile
Read pseudo probes in regular and BAT YAML profile generation, and
attach them to YAML profile basic blocks. This exposes GUID, probe id,
and probe type in profile for future use in stale profile matching.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: dcci, rafaelauler, ayermolo, maksfb

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/99554
2024-07-18 21:01:40 -07:00
Amir Ayupov
9b007a199d
[BOLT] Expose pseudo probe function checksum and GUID (#99389)
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profile (both regular and BAT), along with function checksum.

To be used for stale function matching.

Test Plan: update pseudoprobe-decoding-inline.test
2024-07-18 20:58:16 -07:00
Amir Ayupov
3023b15fb1 [BOLT] Support POSSIBLE_PIC_FIXED_BRANCH
Detect and support fixed PIC indirect jumps of the following form:
```
movslq  En(%rip), %r1
leaq  PIC_JUMP_TABLE(%rip), %r2
addq  %r2, %r1
jmpq  *%r1
```

with PIC_JUMP_TABLE that looks like following:

```
  JT:  ----------
   E1:| L1 - JT  |
      |----------|
   E2:| L2 - JT  |
      |----------|
      |          |
         ......
   En:| Ln - JT  |
       ----------
```

The code could be produced by compilers, see
https://github.com/llvm/llvm-project/issues/91648.

Test Plan: updated jump-table-fixed-ref-pic.test

Reviewers: maksfb, ayermolo, dcci, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/91667
2024-07-18 20:57:05 -07:00
Sayhaan Siddiqui
c0c157a518
[BOLT][DWARF][NFC] Remove DWO ranges base (#99284)
Removes getters and setters for DWO ranges base due to it not being
used.
2024-07-18 09:24:46 -07:00
Pavel Labath
09cbb45edd
[BOLT][DWARF][NFC] A better DIEBuilder for the llvm API change in #98905 (#99324)
The caller (cloneAttribute) already switches on the reference type. By
aligning the cases with the retrieval functions, we can avoid branching
twice.
2024-07-18 09:46:29 +02:00
Vladislav Khmelevsky
51122fb446
[BOLT][NFC] Fix build (#99361)
On clang 14 the build is failing with:
reference to local binding 'ParentName' declared in enclosing function
'llvm::bolt::RewriteInstance::registerFragments'
2024-07-17 23:17:12 +04:00
Amir Ayupov
3fe50b6dde
[BOLT] Store FileSymRefs in a multimap
With aggressive ICF, it's possible to have different local symbols
(under different FILE symbols) to be mapped to the same address.

FileSymRefs only keeps a single SymbolRef per address, which prevents
fragment matching from finding the correct symbol to perform parent
function lookup.

Work around this issue by switching FileSymRefs to a multimap. In
future, uses of FileSymRefs can be replaced with SortedSymbols which
keeps essentially the same information.

Test Plan: added ambiguous_fragment.test

Reviewers: dcci, ayermolo, maksfb, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/98992
2024-07-16 22:14:43 -07:00
Pavel Labath
9dab91247d Fix bolt for #98905 2024-07-16 13:29:00 +02:00
Sayhaan Siddiqui
e140a8a3c8
[BOLT][DWARF][NFC] Refactor address writers (#98094)
Refactors address writers to create an instance for each CU and its DWO
CU.
2024-07-15 23:03:43 -07:00
Paschalis Mpeis
deff3afd35
[NFC][BOLT] Rename createDummyReturnFunction to createReturnInstructi.. (#98448)
`createDummyReturnFunction` is not creating a function but instead only
a function body that is simply a return statement.
This patch renames it to: `createReturnInstructionList`
2024-07-15 16:30:40 +01:00
Paschalis Mpeis
34433fdceb
[BOLT] Add -print-mappings option to heatmaps (#97567)
Emit a mapping in the legend between the characters/buckets and the text
sections, using:

```sh
llvm-heatmap-bolt -print-mappings ..
```

Example:
```
Legend:
..
Sections:
  a/A : .init      0x00000100-0x00000200
  b/B : .plt       0x00000200-0x00000500
  c/C : .text      0x00010000-0x000a0000
  d/D : .fini      0x000a0000-0x000f0000
..
```
2024-07-15 08:23:06 +01:00
Paschalis Mpeis
587308c343
[BOLT][AArch64] Provide createDummyReturnFunction (#96626)
AArch64 needs this function when instrumenting statically-linked binaries.

Sample commands:
```bash
clang -Wl,-q test.c -static -o out
llvm-bolt -instrument -instrumentation-sleep-time=5 out -o out.instr
```
2024-07-15 07:20:47 +01:00
Shaw Young
131eb30584
[BOLT] Match blocks with calls as anchors (#96596)
Added another hash level – call hash – following opcode hash matching
for stale block matching. Call hash strings are the concatenation of the
lexicographically ordered names of each blocks’ called functions. This 
change bolsters block matching in cases where some instructions have
been removed or added but calls remain constant.

Test Plan: added match-functions-with-calls-as-anchors.test.
2024-07-10 15:46:47 -07:00
Sayhaan Siddiqui
7e10ad99ad
[BOLT][DWARF] Cleanup buffer initialization for DWO range writer (#97843)
Cleanup buffer initialization for DWO range writer instances to remove
empty buffer at the beginning.
2024-07-10 11:35:40 -07:00
Sayhaan Siddiqui
a972b2e9a4
[BOLT][DWARF][NFC] Cleanup RangesBase check (#97840)
Moves check for RangesBase under check for UnitDie. This makes the flow
clearer because we add RangesBase when it is a UnitDie.
2024-07-10 10:53:08 -07:00
Sayhaan Siddiqui
d283627c4a
[BOLT][DWARF][NFC] Update Die to not use std::optional (#97844)
Updates initialization to remove unnecessary use of std::optional.
2024-07-09 16:37:09 -07:00
Sayhaan Siddiqui
f137be30a4
[BOLT][DWARF][NFC] Remove unnecessary SectionOffset (#97841)
Removes unnecessary SectionOffset variable from DebugData.
2024-07-09 16:36:49 -07:00
Sayhaan Siddiqui
a40daa34ef
[BOLT][DWARF][NFC] Cleanup version check (#97839)
Cleans up version check to remove redundant else branch.
2024-07-09 16:36:26 -07:00
Fangrui Song
2718654c54
[MC] Support .cfi_label
GNU assembler 2.26 introduced the .cfi_label directive. It does not
expand to any CFI instructions, but defines a label in
.eh_frame/.debug_frame, which can be used by runtime patching code to
locate the FDE. .cfi_label is not allowed for CIE's initial
instructions, and can therefore be used to force the next instruction to
be placed in a FDE instead of a CIE.

In glibc since 2018, sysdeps/riscv/start.S utilizes .cfi_label to force
DW_CFA_undefined to be placed in a FDE. arc/csky/loongarch ports have
copied this use.
```
.cfi_startproc
// DW_CFA_undefined is allowed for CIE's initial instructions.
// Without .cfi_label, gas would place DW_CFA_undefined in a CIE.
.cfi_label .Ldummy
.cfi_undefined ra
.cfi_endproc
```

No CFI instruction is associated with .cfi_label, so the `case
MCCFIInstruction::OpLabel:` code in BOLT is unreachable and onlt to make
-Wswitch happy.

Close #97222

Pull Request: https://github.com/llvm/llvm-project/pull/97922
2024-07-07 12:41:13 -07:00
Amir Ayupov
dc1da93958
[BOLT][BAT] Add support for three-way split functions (#93760)
In three-way split functions, if only .warm fragment is present, BAT
incorrectly overwrites the map for .warm fragment by empty .cold
fragment.

Test Plan: updated register-fragments-bolt-symbols.s
2024-07-05 15:18:49 -07:00
Shaw Young
37bee25497
[BOLT][NFC] Refactor function matching (#97502)
Moved function matching techniques into separate helper functions for
ease of understanding and to make space for additional function 
matching techniques to be added (e.g. call graph function matching).
2024-07-05 14:44:15 -07:00
Ádám Kallai
e2cee2c1e6
[BOLT][AArch64] Fixes assertion errors occurred when perf2bolt was executed (#83394)
BOLT only checks for the most common indirect branch pattern during the
branch analyzation.
Extended the logic with two other indirect patterns which slightly
differ from the expected one.
Those patterns may be hit when statically linking libc (pattern 2
requires 'lld' linker).

As a workaround mark them as UNKNOWN branch for now. 

Fixes: #83114
2024-07-05 16:24:22 +04:00
Alexander Yermolovich
361350fc89
[BOLT][DWARF] Deduplicate Foreign TU list (#97629)
There could be multiple TUs with the same hash in various DWO files. In
bigger binaries this could be in the thousands. Although they could be
structurally different and we need to output Entries for all of them,
for the purposes of figuring out a TU hash we only need one entry in
Foreign TU list.
2024-07-04 07:20:06 -07:00
Fangrui Song
4c79fac140
[BOLT] Remove workaround for flushPendingLabels
The code emits an empty MCDataFragment to ensure that the labels are
attached to `SplitSection`. The workaround, due to the removed
`flushPendingLabels` mechanism (see
75006466296ed4b0f845cbbec4bf77c21de43b40), is now unneeded.

Pull Request: https://github.com/llvm/llvm-project/pull/97632
2024-07-03 16:40:49 -07:00
Sayhaan Siddiqui
5828b04b03
[BOLT][DWARF] Refactor legacy ranges writers (#96006)
Refactors legacy ranges writers to create a writer for each instance of
a DWO file.

We now write out everything into .debug_ranges after the all the DWO
files are processed. This also changes the order that ranges is written
out in, as before we wrote out while in the main CU processing loop and
we now iterate through the CU buckets created by partitionCUs, after the
main processing loop.
2024-07-03 14:50:40 -07:00
shawbyoung
fd524d4df7 [BOLT] Add Demangle to Profile link components
Added Demangle to Profile link components to fix shared build.
2024-07-03 12:58:55 -07:00
Shaw Young
97dc50882c
[BOLT] Match functions with name similarity (#95884)
A mapping - from namespace to associated binary functions - is used to
match function profiles to binary based on the
'--name-similarity-function-matching-threshold' flag set edit distance
threshold. The flag is set to 0 (exact name matching) by default as it is
expensive, requiring the processing of all BFs.

Test Plan: Added name-similarity-function-matching.test. On a binary
with 5M functions, rewrite passes took ~520s without the flag and
~2018s with the flag set to 20.
2024-07-03 11:39:18 -07:00
Fangrui Song
35668e2c9c
Remove llvm/MC/MCAsmLayout.h and the unused parameter in MCAssembler::layout
This restores 63ec52f867ada8d841dd872acf3d0cb62e2a99e8 and
46f7929879a59ec72dc75679b4201e2d314efba9, NFC changes that were
unnecessarily reverted.

This completes the work that merges MCAsmLayout into MCAssembler.

Pull Request: https://github.com/llvm/llvm-project/pull/97449
2024-07-02 16:56:35 -07:00
Amir Ayupov
344228ebf4 [BOLT] Drop macro-fusion alignment (#97358)
9d0754ada5dbbc0c009bcc2f7824488419cc5530 dropped MC support required for
optimal macro-fusion alignment in BOLT. Remove the support in BOLT as
performance measurements with large binaries didn't show a significant
improvement.

Test Plan:
macro-fusion alignment was never upstreamed, so no upstream tests are
affected.
2024-07-02 09:20:41 -07:00
Davide Italiano
ac0b48a0db Revert "MCAssembler::layout: remove the unused MCAsmLayout parameter"
This reverts commit 63ec52f867ada8d841dd872acf3d0cb62e2a99e8.
2024-07-02 08:54:05 -07:00
Fangrui Song
63ec52f867 MCAssembler::layout: remove the unused MCAsmLayout parameter
Almost complete the MCAsmLayout removal work started by 67957a45ee1ec42ae1671cdbfa0d73127346cc95.
2024-07-01 18:17:05 -07:00
Fangrui Song
e3e0df391c [BOLT] Replace the MCAsmLayout parameter with MCAssembler
Continue the MCAsmLayout removal work started by 67957a45ee1ec42ae1671cdbfa0d73127346cc95.
2024-07-01 18:02:34 -07:00
Fangrui Song
dbf12b2f77 [MC] Remove MCAsmLayout::{getSymbolOffset,getBaseSymbol}
The MCAsmLayout::* forwarders added by
67957a45ee1ec42ae1671cdbfa0d73127346cc95 have all been removed.
2024-07-01 11:51:26 -07:00
Shaw Young
49fdbbcfed
[BOLT] Match functions with exact hash (#96572)
Added flag '--match-profile-with-function-hash' to match functions 
based on exact hash. After identical and LTO name matching, more 
functions can be recovered for inference with exact hash, in the case
of function renaming with no functional changes. Collisions are 
possible in the unlikely case where multiple functions share the same
exact hash. The flag is off by default as it requires the processing of 
all binary functions and subsequently is expensive.

Test Plan: added hashing-based-function-matching.test.
2024-06-29 21:19:00 -07:00
Nathan Sidwell
6c5b62b846
[BOLT][NFC] Separate isReversibleBranch's 2 semantics (#95572)
`isUnsupportedBranch` was renamed (and inverted)  to `isReversibleBranch`, as that was how it was being used. But one use  in `BinaryFunction::disassemble` was using the original meaning to detect unsupported branches, and the `isUnsupportedBranch` had 2 separate semantic checks.

Move the unsupported branch check from `isReversibleBranch` to a new entry point: `isUnsupportedInstruction`. Call that from `BinaryFunction::disassemble`.

Move the dynamic branch check from X86's isReversibleBranch to the base class, as it is not an architecture-specific check.

Remove unnecessary `isReversibleBranch` calls from Instrumentation and X86 MCPlusBuilder.
2024-06-28 07:45:37 -04:00
Maksim Panchenko
d16b21b17d
[BOLT][Linux] Support ORC for alternative instructions (#96709)
Alternative instruction sequences in the Linux kernel can modify the
stack and thus they need their own ORC unwind entries. Since there's
only one ORC table, it has to be "shared" among multiple instruction
sequences. The kernel achieves this by putting a restriction on
instruction boundaries. If ORC state changes at a given IP, only one of
the alternative sequences can have an instruction starting/ending at
this IP. Then, developers can insert NOPs to guarantee the above
requirement is met.

The most common use of ORC with alternatives is "pushf; pop %rax"
sequence used for paravirtualization. Note that newer kernel versions
no longer use .parainstructions; instead, they utilize alternatives for
the same purpose.

Before we implement a better support for alternatives, we can safely
skip ORC entries associated with them.

Fixes #87052.
2024-06-27 19:26:11 -07:00
Maksim Panchenko
ca06b61084
[BOLT] Omit CFI state while printing functions without CFI (#96723)
If a function has no CFI program attached to it, do not print redundant
empty CFI state for every basic block.
2024-06-27 17:26:58 -07:00
shaw young
2430a354bf
[BOLT][NFC] Move CallGraph from Passes to Core (#96922)
Moved CallGraph and BinaryFunctionCallGraph from Passes to
Core for future use in stale matching.
2024-06-27 16:34:47 -07:00