Multi-way splitting can cause multiple fragments to access the same jump
table. Relax the assumption that a jump table can only have up to two
parents.
Test Plan: added bolt/test/X86/three-way-split-jt.s
Reviewers: ayermolo, dcci, rafaelauler, maksfb
Reviewed By: rafaelauler, dcci
Pull Request: https://github.com/llvm/llvm-project/pull/99988
Three-way splitting can create references between split fragments (warm
to cold or vice versa) that are not handled by
`isChildOf/isParentOf/isChildOrParentOf`. Generalize fragment
relationships to allow checking if two functions belong to one group,
potentially in presence of ICF which can join multiple groups.
Test Plan: NFC for existing tests
Reviewers: maksfb, ayermolo, rafaelauler, dcci
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/99979
Followup to the splitting of processUnitDIE, moves code that accesses
common resource to be outside of the function that will be parallelized.
Followup to #99957
In bolt/lib/Passes/AsmDump.cpp, the MCInstPrinter is created with false
AsmVerbose. The AsmVerbose argument to createAsmStreamer is unused.
Deprecate the legacy Target::createAsmStreamer overload, which might be
used by downstream.
from PEP8
(https://peps.python.org/pep-0008/#programming-recommendations):
> Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
Implemented call graph function matching. First, two call graphs are
constructed for both profiled and binary functions. Then functions are
hashed based on the names of their callee/caller functions. Finally,
functions are matched based on these neighbor hashes and the
longest common prefix of their names. The `match-with-call-graph`
flag turns this matching on.
Test Plan: Added match-with-call-graph.test. Matched 164 functions
in a large binary with 10171 profiled functions.
Read pseudo probes in regular and BAT YAML profile generation, and
attach them to YAML profile basic blocks. This exposes GUID, probe id,
and probe type in profile for future use in stale profile matching.
Test Plan: updated pseudoprobe-decoding-inline.test
Reviewers: dcci, rafaelauler, ayermolo, maksfb
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/99554
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profile (both regular and BAT), along with function checksum.
To be used for stale function matching.
Test Plan: update pseudoprobe-decoding-inline.test
Detect and support fixed PIC indirect jumps of the following form:
```
movslq En(%rip), %r1
leaq PIC_JUMP_TABLE(%rip), %r2
addq %r2, %r1
jmpq *%r1
```
with PIC_JUMP_TABLE that looks like following:
```
JT: ----------
E1:| L1 - JT |
|----------|
E2:| L2 - JT |
|----------|
| |
......
En:| Ln - JT |
----------
```
The code could be produced by compilers, see
https://github.com/llvm/llvm-project/issues/91648.
Test Plan: updated jump-table-fixed-ref-pic.test
Reviewers: maksfb, ayermolo, dcci, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/91667
On clang 14 the build is failing with:
reference to local binding 'ParentName' declared in enclosing function
'llvm::bolt::RewriteInstance::registerFragments'
With aggressive ICF, it's possible to have different local symbols
(under different FILE symbols) to be mapped to the same address.
FileSymRefs only keeps a single SymbolRef per address, which prevents
fragment matching from finding the correct symbol to perform parent
function lookup.
Work around this issue by switching FileSymRefs to a multimap. In
future, uses of FileSymRefs can be replaced with SortedSymbols which
keeps essentially the same information.
Test Plan: added ambiguous_fragment.test
Reviewers: dcci, ayermolo, maksfb, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/98992
`createDummyReturnFunction` is not creating a function but instead only
a function body that is simply a return statement.
This patch renames it to: `createReturnInstructionList`
AArch64 needs this function when instrumenting statically-linked binaries.
Sample commands:
```bash
clang -Wl,-q test.c -static -o out
llvm-bolt -instrument -instrumentation-sleep-time=5 out -o out.instr
```
Added another hash level – call hash – following opcode hash matching
for stale block matching. Call hash strings are the concatenation of the
lexicographically ordered names of each blocks’ called functions. This
change bolsters block matching in cases where some instructions have
been removed or added but calls remain constant.
Test Plan: added match-functions-with-calls-as-anchors.test.
GNU assembler 2.26 introduced the .cfi_label directive. It does not
expand to any CFI instructions, but defines a label in
.eh_frame/.debug_frame, which can be used by runtime patching code to
locate the FDE. .cfi_label is not allowed for CIE's initial
instructions, and can therefore be used to force the next instruction to
be placed in a FDE instead of a CIE.
In glibc since 2018, sysdeps/riscv/start.S utilizes .cfi_label to force
DW_CFA_undefined to be placed in a FDE. arc/csky/loongarch ports have
copied this use.
```
.cfi_startproc
// DW_CFA_undefined is allowed for CIE's initial instructions.
// Without .cfi_label, gas would place DW_CFA_undefined in a CIE.
.cfi_label .Ldummy
.cfi_undefined ra
.cfi_endproc
```
No CFI instruction is associated with .cfi_label, so the `case
MCCFIInstruction::OpLabel:` code in BOLT is unreachable and onlt to make
-Wswitch happy.
Close#97222
Pull Request: https://github.com/llvm/llvm-project/pull/97922
In three-way split functions, if only .warm fragment is present, BAT
incorrectly overwrites the map for .warm fragment by empty .cold
fragment.
Test Plan: updated register-fragments-bolt-symbols.s
Moved function matching techniques into separate helper functions for
ease of understanding and to make space for additional function
matching techniques to be added (e.g. call graph function matching).
BOLT only checks for the most common indirect branch pattern during the
branch analyzation.
Extended the logic with two other indirect patterns which slightly
differ from the expected one.
Those patterns may be hit when statically linking libc (pattern 2
requires 'lld' linker).
As a workaround mark them as UNKNOWN branch for now.
Fixes: #83114
There could be multiple TUs with the same hash in various DWO files. In
bigger binaries this could be in the thousands. Although they could be
structurally different and we need to output Entries for all of them,
for the purposes of figuring out a TU hash we only need one entry in
Foreign TU list.
The code emits an empty MCDataFragment to ensure that the labels are
attached to `SplitSection`. The workaround, due to the removed
`flushPendingLabels` mechanism (see
75006466296ed4b0f845cbbec4bf77c21de43b40), is now unneeded.
Pull Request: https://github.com/llvm/llvm-project/pull/97632
Refactors legacy ranges writers to create a writer for each instance of
a DWO file.
We now write out everything into .debug_ranges after the all the DWO
files are processed. This also changes the order that ranges is written
out in, as before we wrote out while in the main CU processing loop and
we now iterate through the CU buckets created by partitionCUs, after the
main processing loop.
A mapping - from namespace to associated binary functions - is used to
match function profiles to binary based on the
'--name-similarity-function-matching-threshold' flag set edit distance
threshold. The flag is set to 0 (exact name matching) by default as it is
expensive, requiring the processing of all BFs.
Test Plan: Added name-similarity-function-matching.test. On a binary
with 5M functions, rewrite passes took ~520s without the flag and
~2018s with the flag set to 20.
This restores 63ec52f867ada8d841dd872acf3d0cb62e2a99e8 and
46f7929879a59ec72dc75679b4201e2d314efba9, NFC changes that were
unnecessarily reverted.
This completes the work that merges MCAsmLayout into MCAssembler.
Pull Request: https://github.com/llvm/llvm-project/pull/97449
9d0754ada5dbbc0c009bcc2f7824488419cc5530 dropped MC support required for
optimal macro-fusion alignment in BOLT. Remove the support in BOLT as
performance measurements with large binaries didn't show a significant
improvement.
Test Plan:
macro-fusion alignment was never upstreamed, so no upstream tests are
affected.