Identical Code Folding (ICF) folds functions that are identical into one
function, and updates symbol addresses to the new address. This reduces
the size of a binary, but can lead to problems. For example when
function pointers are compared. This can be done either explicitly in
the code or generated IR by optimization passes like Indirect Call
Promotion (ICP). After ICF what used to be two different addresses
become the same address. This can lead to a different code path being
taken.
This is where safe ICF comes in. Linker (LLD) does it using address
significant section generated by clang. If symbol is in it, or an object
doesn't have this section symbols are not folded.
BOLT does not have the information regarding which objects do not have
this section, so can't re-use this mechanism.
This implementation scans code section and conservatively marks
functions symbols as unsafe. It treats symbols as unsafe if they are
used in non-control flow instruction. It also scans through the data
relocation sections and does the same for relocations that reference a
function symbol. The latter handles the case when function pointer is
stored in a local or global variable, etc. If a relocation address
points within a vtable these symbols are skipped.
from PEP8
(https://peps.python.org/pep-0008/#programming-recommendations):
> Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
Implemented call graph function matching. First, two call graphs are
constructed for both profiled and binary functions. Then functions are
hashed based on the names of their callee/caller functions. Finally,
functions are matched based on these neighbor hashes and the
longest common prefix of their names. The `match-with-call-graph`
flag turns this matching on.
Test Plan: Added match-with-call-graph.test. Matched 164 functions
in a large binary with 10171 profiled functions.
A mapping - from namespace to associated binary functions - is used to
match function profiles to binary based on the
'--name-similarity-function-matching-threshold' flag set edit distance
threshold. The flag is set to 0 (exact name matching) by default as it is
expensive, requiring the processing of all BFs.
Test Plan: Added name-similarity-function-matching.test. On a binary
with 5M functions, rewrite passes took ~520s without the flag and
~2018s with the flag set to 20.
Added flag '--match-profile-with-function-hash' to match functions
based on exact hash. After identical and LTO name matching, more
functions can be recovered for inference with exact hash, in the case
of function renaming with no functional changes. Collisions are
possible in the unlikely case where multiple functions share the same
exact hash. The flag is off by default as it requires the processing of
all binary functions and subsequently is expensive.
Test Plan: added hashing-based-function-matching.test.
Summary: Functions with high discrepancy
(measured by matched function blocks)
can be ignored with an added command line
argument for better performance.
Test Plan: Added
stale-matching-min-matched-block.test
---------
Co-authored-by: Amir Ayupov <aaupov@fb.com>
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode (`set_property(TARGET <target>
PROPERTY FOLDER "<title>")`) when using the respective CMake's IDE
generator.
* Ensure that every target is in a folder
* Use a folder hierarchy with each LLVM subproject as a top-level folder
* Use consistent folder names between subprojects
* When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
Provide secondary entry points for `EntryDiscriminator` call info field
in YAML profile.
Increases BAT section size to:
- large binary: 39655300 bytes (1.03x the original),
- medium binary: 3834328 bytes (0.65x),
- small binary: 924 bytes (0.64x).
Depends on: https://github.com/llvm/llvm-project/pull/76911
Test Plan:
- Updated bolt-address-translation{,-yaml}.test
- Added openssl test: https://github.com/rafaelauler/bolt-tests/pull/30
Reviewers: dcci, rafaelauler, maksfb, ayermolo
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/86218
YAML profile reader checks the number of basic blocks in regular,
no-stale-matching mode. Add it to BAT.
This increases the size of BAT section to:
- large binary: 39583080 bytes (1.02x of the original),
- medium binary: 3816492 bytes (0.64x),
- small binary: 920 bytes (0.64x, no change due to alignment).
Test Plan: Updated bolt-address-translation-yaml.test
Reviewers: rafaelauler, ayermolo, maksfb, dcci
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/86045
Add input basic block index to BAT metadata. This addresses the case
where some basic blocks are eliminated, and output index is not equal
to the input block index. These indices are used in non-stale-matching
mode.
Increases BAT section size to:
- large binary: 39521512 bytes (1.02x original),
- medium binary: 3799988 bytes (0.64x),
- small binary: 920 bytes (0.64x).
Test Plan:
Updated bolt-address-translation{,-yaml}.test
Pull Request: https://github.com/llvm/llvm-project/pull/86044
Relax assumptions that YAML output is not supported in BAT mode.
Set up basic infrastructure for emitting YAML for functions not covered
by BAT, such as from `.bolt.org.text` section (code identical to input binary
sans external refs), or non-rewritten functions in non-relocation mode (where
the function stays in the same section but BAT mapping is not emitted).
This diff only produces YAML profile for non-BAT functions (skipped,
non-simple). YAML profile for BAT functions is added in follow-up diffs:
- https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with
internal control flow information only (branch profile),
- https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile
(calls profile).
Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: ayermolo, dcci, maksfb, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/76910
Encode BRANCHENTRY bits as bitmask for deduplicated entries.
Reduces BAT section size:
- large binary: to 11834216 bytes (0.31x original),
- medium binary: to 1565584 bytes (0.26x original),
- small binary: to 336 bytes (0.23x original).
Test Plan: Updated bolt/test/X86/bolt-address-translation.test
Make output function addresses be delta-encoded wrt last offset in the
previous function. This reduces the deltas in function start addresses.
Test Plan:
Reduces BAT section size to:
- large binary: 12218860 bytes (0.32x original),
- medium binary: 1606580 bytes (0.27x original),
- small binary: 404 bytes (0.28x original),
Reviewers: rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/76904
Reduces BAT section size:
- large binary: to 12283500 bytes (0.32x original size),
- medium binary: to 1616020 bytes (0.27x original size),
- small binary: to 404 bytes (0.28x original size).
Test Plan: Updated bolt/test/X86/bolt-address-translation.test
Further reduce the size of BAT section:
- large binary: to 12716312 bytes (0.33x original),
- medium binary: to 1649472 bytes (0.28x original),
- small binary: to 428 bytes (0.30x original).
Test Plan: Updated bolt/test/X86/bolt-address-translation.test
This change further reduces the size of BAT:
- large binary: to 13073904 bytes (0.34x original),
- medium binary: to 1703116 bytes (0.29x original),
- small binary: to 436 bytes (0.30x original).
Test Plan: Updated bolt/test/X86/bolt-address-translation.test
Closes https://github.com/llvm/llvm-project/issues/63097
Before merging please make sure the change to
bolt/include/bolt/Passes/StokeInfo.h is correct.
bolt/include/bolt/Passes/StokeInfo.h
```diff
// This Pass solves the two major problems to use the Stoke program without
- // proting its code:
+ // probing its code:
```
I'm still not happy about the awkward wording in this comment.
bolt/include/bolt/Passes/FixRelaxationPass.h
```
$ ed -s bolt/include/bolt/Passes/FixRelaxationPass.h <<<'9,12p'
// This file declares the FixRelaxations class, which locates instructions with
// wrong targets and fixes them. Such problems usually occures when linker
// relaxes (changes) instructions, but doesn't fix relocations types properly
// for them.
$
```
bolt/docs/doxygen.cfg.in
bolt/include/bolt/Core/BinaryContext.h
bolt/include/bolt/Core/BinaryFunction.h
bolt/include/bolt/Core/BinarySection.h
bolt/include/bolt/Core/DebugData.h
bolt/include/bolt/Core/DynoStats.h
bolt/include/bolt/Core/Exceptions.h
bolt/include/bolt/Core/MCPlusBuilder.h
bolt/include/bolt/Core/Relocation.h
bolt/include/bolt/Passes/FixRelaxationPass.h
bolt/include/bolt/Passes/InstrumentationSummary.h
bolt/include/bolt/Passes/ReorderAlgorithm.h
bolt/include/bolt/Passes/StackReachingUses.h
bolt/include/bolt/Passes/StokeInfo.h
bolt/include/bolt/Passes/TailDuplication.h
bolt/include/bolt/Profile/DataAggregator.h
bolt/include/bolt/Profile/DataReader.h
bolt/lib/Core/BinaryContext.cpp
bolt/lib/Core/BinarySection.cpp
bolt/lib/Core/DebugData.cpp
bolt/lib/Core/DynoStats.cpp
bolt/lib/Core/Relocation.cpp
bolt/lib/Passes/Instrumentation.cpp
bolt/lib/Passes/JTFootprintReduction.cpp
bolt/lib/Passes/ReorderData.cpp
bolt/lib/Passes/RetpolineInsertion.cpp
bolt/lib/Passes/ShrinkWrapping.cpp
bolt/lib/Passes/TailDuplication.cpp
bolt/lib/Rewrite/BoltDiff.cpp
bolt/lib/Rewrite/DWARFRewriter.cpp
bolt/lib/Rewrite/RewriteInstance.cpp
bolt/lib/Utils/CommandLineOpts.cpp
bolt/runtime/instr.cpp
bolt/test/AArch64/got-ld64-relaxation.test
bolt/test/AArch64/unmarked-data.test
bolt/test/X86/Inputs/dwarf5-cu-no-debug-addr-helper.s
bolt/test/X86/Inputs/linenumber.cpp
bolt/test/X86/double-jump.test
bolt/test/X86/dwarf5-call-pc-function-null-check.test
bolt/test/X86/dwarf5-split-dwarf4-monolithic.test
bolt/test/X86/dynrelocs.s
bolt/test/X86/fallthrough-to-noop.test
bolt/test/X86/tail-duplication-cache.s
bolt/test/runtime/X86/instrumentation-ind-calls.s
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: jhenderson, #libc, Mordante, sivachandra
Differential Revision: https://reviews.llvm.org/D150784
They don't convey any useful information and make the documentation
unnecessarily hard to read.
Differential Revision: https://reviews.llvm.org/D149641
Add stub Sphinx documentation, with configuration copy-pasted from lld and
index page converted from bolt/README.md.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D140156
The SplitFunctions pass does not distinguish between various splitting
modes anymore. This change updates the command line interface to
reflect this behavior by deprecating values passed to the
--split-function option.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D128558
Clang's doxygen documentation specifies LLVM revision. Do the same for BOLT.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126912
Replace "cache+" with "ext-tsp" in all BOLT tests
Test Plan:
```
ninja check-bolt
grep -rnw . -e "cache+"
```
no more tests containing "cache+"
"cache+" and "ext-tsp" are aliases
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126714