Sometimes we need to know the size of a symbol besides its address, so
maybe we can start using the existing `BOLTLinker::lookupSymbolInfo()`
(that returns symbol address and size) and remove
`BOLTLinker::lookupSymbol()` (that only returns symbol address). And for
both we need to check return value as it is wrapped in `std::optional<>`,
which makes the difference even smaller.
Bolt currently links and initializes all LLVM targets. This
substantially increases the binary size, and link time if LTO is used.
Instead, only link the targets specified by BOLT_TARGETS_TO_BUILD. We
also have to only initialize those targets, so generate a
TargetConfig.def file with the necessary information. The way the
initialization is done mirrors what llvm-exegesis does.
This reduces llvm-bolt size from 137MB to 78MB for me.
Traces are triplets of branch source, target, and fall-through end (next
branch).
Traces simplify differentiation of fall-throughs into local- and
external-origin, which improves performance over profile with
undifferentiated fall-throughs by eliminating profile discontinuity in
call to continuation fall-throughs. This makes it possible to avoid
converting return profile into call to continuation profile which may
introduce statistical biases.
The existing format makes provisions for local- (F) and external- (f)
origin fall-throughs, but the profile producer needs to know function
boundaries. BOLT has that information readily available, so providing
the origin branch of a fall-through is a functional replacement of the
fall-through kind (f or F). This also has an effect of combining
branches and fall-throughs into a single record.
As traces subsume other pre-aggregated profile kinds, BOLT may drop
support for them soon. Users of pre-aggregated profile format are
advised to migrate to the trace format.
Test Plan: Updated callcont-fallthru.s
`addPendingRelocation` is the only way to add a pending
relocation. Can no longer use `addRelocation` for this.
Update the only user (`BinaryContextTester`).
Use LLVM's getMainExecutable() helper instead of rolling our own. This
will result in standard behavior across platforms, such as making sure
that symlinks are always resolved.
When printing disassembly of a function with constant islands, include
the island info in the dump.
At the moment, only print islands in pre-CFG state. Include islands that
are interleaved with instructions.
A few tests generate a statically-linked position-independent executable
with `-nostdlib -Wl,--unresolved-symbols=ignore-all -pie` (`%clang`) and
test PLT handling. (--unresolved-symbols=ignore-all suppresses undefined
symbol errors and serves as a convenience hack.)
This relies on an unguaranteed linker behavior: a statically-linked PIE
does not necessarily generate PLT entries.
While current lld generates a PLT entry, it will change to suppress the
PLT entry to simplify internal handling and improve consistency.
(The behavior has no consistency in GNU ld, some ports generated a
.dynsym entry while some don't. While most seem to generate a PLT entry
but some ports use a weird `R_*_NONE` relocation.)
When a function has an indirect branch with unknown control flow, we
preserve nops in order to keep all instruction offsets (from the start
of the function) the same in case the indirect branch is used by a
PC-relative jump table. However, when we know the control flow of the
function, we should be able to safely remove nops.
The code for jump table detection on AArch64 asserts liberally whenever
the input instruction sequence does not match the expected pattern. As a
result, BOLT fails to process binaries with such sequences instead of
ignoring functions with unknown control flow.
Remove asserts in analyzeIndirectBranchFragment() and mark indirect
jumps as instructions with unknown control flow instead.
Remove options to generate autofdo data (unused) and `use-event-pc`
(not beneficial).
Cuts down perf2bolt time for 11GB perf.data by 40s (11:10->10:30).
This fixes the following tests:
BOLT :: AArch64/check-init-not-moved.s
BOLT :: X86/dwarf5-dwarf4-types-backward-forward-cross-reference.test
BOLT :: X86/dwarf5-locexpr-referrence.test
When clang is compiled with `-DENABLE_LINKER_BUILD_ID=ON`.
This fixes a number of issues introduced in #97130 when
LLVM_LIBDIR_SUFFIX is a non-empty string. Make sure that the libdir is
always referenced as `lib${LLVM_LIBDIR_SUFFIX}`, not as just `lib` or
`${CMAKE_INSTALL_LIBDIR}${LLVM_LIBDIR_SUFFIX}`.
This is the standard libdir convention for all LLVM subprojects. Using
`${CMAKE_INSTALL_LIBDIR}${LLVM_LIBDIR_SUFFIX}` would result in a
duplicate suffix.
Bolt makes use of add_llvm_library and as such ends up exporting its
libraries from LLVMExports.cmake, which is not correct.
Bolt doesn't have its own exports file, and I assume that there is no
desire to have one either -- Bolt libraries are not intended to be
consumed as a cmake module, right?
As such, this PR adds a NO_EXPORT option to simplify exclude these
libraries from the exports file.
- **Reapply "[BOLT] Add --pad-funcs-before=func:n (#117924)"**
- **[BOLT] Fix --pad-funcs{,-before} state misinteraction**
When --pad-funcs-before was introduced, it introduced a bug whereby the
first one to get parsed could influence the other.
Ensure that each has its own state and test that they don't interact in
this manner by testing how the `_subsequent` symbol moves when both
arguments are supplied with different padding values.
Fixed by having a function (and static state) for each of before/after.
14dcf8214f9c66172d17c1cfaec6aec0030748e0 introduced a subtle bug with
the static `FunctionPadding` map.
If either `opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` are set,
the map is going to be populated with the respective spec in the first
invocation of `BinaryEmitter::emitFunction`. The subsequent invocations
will pick up the padding from the map irrespective of whether
`opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` is passed as a
parameter.
This breaks an internal test, hence reverting the patch.
This patch makes sure that `BinaryContext::printInstruction` prints the
preferred disassembly. Preferred disassembly only gets printed when
there are no annotations on the MCInst. Therefore, this patch
temporarily removes the annotations before printing it.
A few examples of before and after on AArch64 instructions are as
follows:
```
BEFORE AFTER
(preferred disassembly)
ret x30 ret
orr x30, xzr, x0 mov x30, x0
hint #29 autiasp
hint #12 autia1716
```
Clearly, the preferred disassembly is easier for developers to read, and
is the disassembly that tools should be printing.
This patch is motivated as part of future work on the
llvm-bolt-binary-analysis tool, making sure that the reports it prints
do use preferred disassembly.
This patch was cherry-picked from
https://github.com/kbeyls/llvm-project/tree/bolt-gadget-scanner-prototype.
In this current patch, this only affects existing RISCV test cases.
This patch also does improve test cases in future patches that will
introduce a binary analysis for llvm-bolt-binary-analysis that checks
for correct application of pac-ret (pointer authentication on return
addresses).
Identical Code Folding (ICF) folds functions that are identical into one
function, and updates symbol addresses to the new address. This reduces
the size of a binary, but can lead to problems. For example when
function pointers are compared. This can be done either explicitly in
the code or generated IR by optimization passes like Indirect Call
Promotion (ICP). After ICF what used to be two different addresses
become the same address. This can lead to a different code path being
taken.
This is where safe ICF comes in. Linker (LLD) does it using address
significant section generated by clang. If symbol is in it, or an object
doesn't have this section symbols are not folded.
BOLT does not have the information regarding which objects do not have
this section, so can't re-use this mechanism.
This implementation scans code section and conservatively marks
functions symbols as unsafe. It treats symbols as unsafe if they are
used in non-control flow instruction. It also scans through the data
relocation sections and does the same for relocations that reference a
function symbol. The latter handles the case when function pointer is
stored in a local or global variable, etc. If a relocation address
points within a vtable these symbols are skipped.
The current implementation of `lookupStubFromGroup` is incorrect. The
function is intended to find and return the closest stub using
`lower_bound`, which identifies the first element in a sorted range that
is not less than a specified value. However, if such an element is not
found within `Candidates` and the list is not empty, the function
returns `nullptr`. Instead, it should check whether the last element
satisfies the condition.
The key address in the static keys jump table was incorrectly encoded as
an absolute value instead of PC-relative causing incorrect
interpretation of the "likely" property of the key.
merge-fdata used to consider misprediction count as part of "signature",
or the aggregation key. This prevented it from collapsing profile lines
with different misprediction counts, which resulted in duplicate
`(from, to)` pairs with different misprediction and execution counts.
Fix that by splitting out misprediction count and accumulating it
separately.
Test Plan: updated bolt/test/merge-fdata-lbr-mode.test
Eliminate splitting the buffer into lines, and use `std::getline`
directly. Simplify no_lbr and boltedcollection handling as well.
Test Plan: For a large fdata file (200MB), fstream version is ~10%
faster.
Added support to BOLT for DW_OP_GNU_push_tls_address. So now
DW_TAG_variable with this OP in DW_AT_location will appear in debug
names acceleration table. Although not in the DWARF 5 spec it is similar
to DW_OP_form_tls_address. Without this support llvm-dwarfdump --verify
--debug-names will report errors.
This change affects non-relocation mode only. Prior to having
CheckLargeFunctions pass, we could have emitted code for functions that
was discarded at the end due to size limitations. Since we didn't know
at the time of emission if the code would be discarded or not, we had to
emit jump tables in separate sections and handle them separately.
However, now we always run CheckLargeFunctions and make sure all emitted
code is used. Thus, we can get rid of the special jump table handling.
When a basic sample profile is gathered without LBR info, the generated
profile contains a "no-lbr" tag in the first line of the fdata file.
This PR fixes merge-fdata to recognize and save this tag to the output
file.
This patch adds a requirement for a non root user in
unreadable-profile.test. This test fails if run as a root user (like in
a container without explicitly changing the user), which can lead to
some CI test failures.
This fix handles a case where a DIE that does not have
DW_AT_name/DW_AT_linkage_name, but has a reference to another DIE using
DW_AT_abstract_origin/DW_AT_specification. It also fixes a bug where
there are cross CU references for those attributes. Previously it would
use a DWARF Unit of a DIE which was being processed The
warf5-debug-names-cross-cu.s test just happened to work because how it
was constructed where string section was shared by both DWARF Units.
To resolve DW_AT_name/DW_AT_linkage_name this patch iterates over
references until it either reaches the final DIE or finds both of those
names.
Use SymbolStringPtr for Symbol names in LinkGraph. This reduces string interning
on the boundary between JITLink and ORC, and allows pointer comparisons (rather
than string comparisons) between Symbol names. This should improve the
performance and readability of code that bridges between JITLink and ORC (e.g.
ObjectLinkingLayer and ObjectLinkingLayer::Plugins).
To enable use of SymbolStringPtr a std::shared_ptr<SymbolStringPool> is added to
LinkGraph and threaded through to its construction sites in LLVM and Bolt. All
LinkGraphs that are to have symbol names compared by pointer equality must point
to the same SymbolStringPool instance, which in ORC sessions should be the pool
attached to the ExecutionSession.
---------
Co-authored-by: Lang Hames <lhames@gmail.com>