2396 Commits

Author SHA1 Message Date
Nikita Popov
c99d6118fe [BOLT] Use getMainExecutable() (#126698)
Use LLVM's getMainExecutable() helper instead of rolling our own. This
will result in standard behavior across platforms, such as making sure
that symlinks are always resolved.

(cherry picked from commit 0abe058d7f99c9c7bbaf4ee98308c5e78d229897)
2025-02-12 19:40:50 -08:00
Fangrui Song
553185bd71 [BOLT,test] Link against a shared object to test PLT (#125625)
A few tests generate a statically-linked position-independent executable
with `-nostdlib -Wl,--unresolved-symbols=ignore-all -pie` (`%clang`) and
test PLT handling. (--unresolved-symbols=ignore-all suppresses undefined
symbol errors and serves as a convenience hack.)

This relies on an unguaranteed linker behavior: a statically-linked PIE
does not necessarily generate PLT entries.
While current lld generates a PLT entry, it will change to suppress the
PLT entry to simplify internal handling and improve consistency.

(The behavior has no consistency in GNU ld, some ports generated a
.dynsym entry while some don't. While most seem to generate a PLT entry
but some ports use a weird `R_*_NONE` relocation.)

(cherry picked from commit a907008bcb8dcc093f8aa5c0450d92cd63473b81)
2025-02-11 12:28:07 -08:00
Maksim Panchenko
ef232a7e34
[BOLT][AArch64] Remove nops in functions with defined control flow (#124705)
When a function has an indirect branch with unknown control flow, we
preserve nops in order to keep all instruction offsets (from the start
of the function) the same in case the indirect branch is used by a
PC-relative jump table. However, when we know the control flow of the
function, we should be able to safely remove nops.
2025-01-28 11:03:49 -08:00
Maksim Panchenko
1b4bd4e1a5
[BOLT][AArch64] Remove assertions from jump table heuristic (#124372)
The code for jump table detection on AArch64 asserts liberally whenever
the input instruction sequence does not match the expected pattern. As a
result, BOLT fails to process binaries with such sequences instead of
ignoring functions with unknown control flow.

Remove asserts in analyzeIndirectBranchFragment() and mark indirect
jumps as instructions with unknown control flow instead.
2025-01-24 16:43:02 -08:00
Maksim Panchenko
34c6c5e72f
[BOLT][AArch64] Fix PLT optimization (#124192)
Preserve C++ exception metadata while running PLT optimization on
AArch64.
2025-01-24 14:20:24 -08:00
Amir Ayupov
e6c9cd9c06
[BOLT] Drop parsing sample PC when processing LBR perf data (#123420)
Remove options to generate autofdo data (unused) and `use-event-pc`
(not beneficial).

Cuts down perf2bolt time for 11GB perf.data by 40s (11:10->10:30).
2025-01-21 09:04:49 -08:00
Alexey Moksyakov
ad599c25d9
[BOLT][AArch64] Add isPush & isPop (#120713)
This functionality is needed for inliner pass and also for correct dyno
stats.

Needed for [PR](https://github.com/llvm/llvm-project/pull/120187)
2025-01-20 10:42:48 +08:00
Nicholas
ee4282259d
[BOLT][AArch64]support inline-small-functions for AArch64 (#120187)
Add some functions in `AArch64MCPlusBuilder.cpp` to support inline for
AArch64.
2025-01-17 17:55:55 +08:00
Nikita Popov
320c2ee6c2
[BOLT] Pass -Wl,--build-id=none to linker in tests (#122886)
This fixes the following tests:

    BOLT :: AArch64/check-init-not-moved.s
    BOLT :: X86/dwarf5-dwarf4-types-backward-forward-cross-reference.test
    BOLT :: X86/dwarf5-locexpr-referrence.test

When clang is compiled with `-DENABLE_LINKER_BUILD_ID=ON`.
2025-01-17 10:09:26 +01:00
Nikita Popov
3c42a77456
[BOLT] Fix handling of LLVM_LIBDIR_SUFFIX (#122874)
This fixes a number of issues introduced in #97130 when
LLVM_LIBDIR_SUFFIX is a non-empty string. Make sure that the libdir is
always referenced as `lib${LLVM_LIBDIR_SUFFIX}`, not as just `lib` or
`${CMAKE_INSTALL_LIBDIR}${LLVM_LIBDIR_SUFFIX}`.

This is the standard libdir convention for all LLVM subprojects. Using
`${CMAKE_INSTALL_LIBDIR}${LLVM_LIBDIR_SUFFIX}` would result in a
duplicate suffix.
2025-01-17 09:38:00 +01:00
Nicholas
1fa02b9684
[BOLT][AArch64] Speedup computeInstructionSize (#121106)
AArch64 instructions have a fixed size 4 bytes, no need to compute.
2025-01-17 09:48:17 +08:00
Nikita Popov
e7244d8659
[BOLT][CMake] Don't export bolt libraries in LLVMExports.cmake (#121936)
Bolt makes use of add_llvm_library and as such ends up exporting its
libraries from LLVMExports.cmake, which is not correct.

Bolt doesn't have its own exports file, and I assume that there is no
desire to have one either -- Bolt libraries are not intended to be
consumed as a cmake module, right?

As such, this PR adds a NO_EXPORT option to simplify exclude these
libraries from the exports file.
2025-01-08 09:41:09 +01:00
Peter Waller
aa9cc721e5
Reapply "[BOLT] Add --pad-funcs-before=func:n (#117924)" (#121918)
- **Reapply "[BOLT] Add --pad-funcs-before=func:n (#117924)"**
- **[BOLT] Fix --pad-funcs{,-before} state misinteraction**

When --pad-funcs-before was introduced, it introduced a bug whereby the
first one to get parsed could influence the other.

Ensure that each has its own state and test that they don't interact in
this manner by testing how the `_subsequent` symbol moves when both
arguments are supplied with different padding values.

Fixed by having a function (and static state) for each of before/after.
2025-01-07 17:25:04 +00:00
Amir Ayupov
be21bd9bbf Revert "[BOLT] Add --pad-funcs-before=func:n (#117924)"
14dcf8214f9c66172d17c1cfaec6aec0030748e0 introduced a subtle bug with
the static `FunctionPadding` map.

If either `opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` are set,
the map is going to be populated with the respective spec in the first
invocation of `BinaryEmitter::emitFunction`. The subsequent invocations
will pick up the padding from the map irrespective of whether
`opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` is passed as a
parameter.

This breaks an internal test, hence reverting the patch.
2025-01-06 12:57:43 -08:00
Davide Italiano
62c39d7734
[BOLT/docs] The support for macro-op fusion was removed. (#121158)
Update the documentation accordingly.
2024-12-26 11:18:12 -08:00
Franklin
6e8a1a45a7
[BOLT] Detect Linux kernel version if the binary is a Linux kernel (#119088)
This makes it easier to handle differences (e.g. of exception table
entry size) between versions of Linux kernel
2024-12-26 09:54:23 -08:00
Alexey Moksyakov
e11d49cbf5
[BOLT][AArch64] Adds tls relocations support (#117465)
Co-authored-by: yavtuk <yavtuk@ya.ru>
2024-12-20 15:54:36 +03:00
Kristof Beyls
4111841f88
[BOLT] Correctly print preferred disassembly for annotated instructions (#120564)
This patch makes sure that `BinaryContext::printInstruction` prints the
preferred disassembly. Preferred disassembly only gets printed when
there are no annotations on the MCInst. Therefore, this patch
temporarily removes the annotations before printing it.

A few examples of before and after on AArch64 instructions are as
follows:

```
  BEFORE                     AFTER
                             (preferred disassembly)

  ret   x30                  ret
  orr   x30, xzr, x0         mov   x30, x0
  hint  #29                  autiasp
  hint  #12                  autia1716
```

Clearly, the preferred disassembly is easier for developers to read, and
is the disassembly that tools should be printing.

This patch is motivated as part of future work on the
llvm-bolt-binary-analysis tool, making sure that the reports it prints
do use preferred disassembly.

This patch was cherry-picked from
https://github.com/kbeyls/llvm-project/tree/bolt-gadget-scanner-prototype.

In this current patch, this only affects existing RISCV test cases.

This patch also does improve test cases in future patches that will
introduce a binary analysis for llvm-bolt-binary-analysis that checks
for correct application of pac-ret (pointer authentication on return
addresses).
2024-12-20 08:54:07 +00:00
Maksim Panchenko
21684e38ee
[BOLT][Linux] Refactor reading of PC-relative addresses. NFCI (#120491)
Fix evaluation order problem identified in
https://github.com/llvm/llvm-project/pull/119088.
2024-12-19 10:40:25 -08:00
Alexander Yermolovich
3c357a49d6
[BOLT] Add support for safe-icf (#116275)
Identical Code Folding (ICF) folds functions that are identical into one
function, and updates symbol addresses to the new address. This reduces
the size of a binary, but can lead to problems. For example when
function pointers are compared. This can be done either explicitly in
the code or generated IR by optimization passes like Indirect Call
Promotion (ICP). After ICF what used to be two different addresses
become the same address. This can lead to a different code path being
taken.

This is where safe ICF comes in. Linker (LLD) does it using address
significant section generated by clang. If symbol is in it, or an object
doesn't have this section symbols are not folded.

BOLT does not have the information regarding which objects do not have
this section, so can't re-use this mechanism.

This implementation scans code section and conservatively marks
functions symbols as unsafe. It treats symbols as unsafe if they are
used in non-control flow instruction. It also scans through the data
relocation sections and does the same for relocations that reference a
function symbol. The latter handles the case when function pointer is
stored in a local or global variable, etc. If a relocation address
points within a vtable these symbols are skipped.
2024-12-16 21:49:53 -08:00
Alexander Yermolovich
0a7e048667
[BOLT][DWARF][NFC] Minimize dwarf5-debug-names-gnu-push-tls-address.s (#120103)
Removed unnecessary parts from the .text section.
2024-12-16 09:54:00 -08:00
Nicholas
671095b452
[BOLT][AArch64] Check Last Element Instead of Returning nullptr in lookupStubFromGroup (#114015)
The current implementation of `lookupStubFromGroup` is incorrect. The
function is intended to find and return the closest stub using
`lower_bound`, which identifies the first element in a sorted range that
is not less than a specified value. However, if such an element is not
found within `Candidates` and the list is not empty, the function
returns `nullptr`. Instead, it should check whether the last element
satisfies the condition.
2024-12-16 12:14:11 +00:00
Paschalis Mpeis
2df48fa78b
[BOLT][AArch64] Enable function print after ADRRelaxation (#119869)
Introduce `--print-adr-relaxation` to print after ADR Relaxation pass.
2024-12-16 12:06:56 +00:00
Maksim Panchenko
f86f4574bb
[BOLT][Linux] Fix static keys test case (#119771)
The key address in the static keys jump table was incorrectly encoded as
an absolute value instead of PC-relative causing incorrect
interpretation of the "likely" property of the key.
2024-12-15 17:13:04 -08:00
Amir Ayupov
8652608404
[BOLT] Fix counts aggregation in merge-fdata (#119652)
merge-fdata used to consider misprediction count as part of "signature",
or the aggregation key. This prevented it from collapsing profile lines
with different misprediction counts, which resulted in duplicate
`(from, to)` pairs with different misprediction and execution counts.

Fix that by splitting out misprediction count and accumulating it
separately.

Test Plan: updated bolt/test/merge-fdata-lbr-mode.test
2024-12-14 22:38:24 -08:00
Amir Ayupov
97f43364cc
[BOLT][NFC] Speedup merge-fdata (#119942)
Eliminate splitting the buffer into lines, and use `std::getline`
directly. Simplify no_lbr and boltedcollection handling as well.

Test Plan: For a large fdata file (200MB), fstream version is ~10%
faster.
2024-12-14 22:26:20 -08:00
Alexander Yermolovich
331c2dd8b4
[BOLT][DWARF] Add support for DW_OP_GNU_push_tls_address to .debug_names (#119939)
Added support to BOLT for DW_OP_GNU_push_tls_address. So now
DW_TAG_variable with this OP in DW_AT_location will appear in debug
names acceleration table. Although not in the DWARF 5 spec it is similar
to DW_OP_form_tls_address. Without this support llvm-dwarfdump --verify
--debug-names will report errors.
2024-12-14 09:30:25 -08:00
Maksim Panchenko
b560b87ba1
[BOLT] Clean up jump table handling in non-reloc mode. NFCI (#119614)
This change affects non-relocation mode only. Prior to having
CheckLargeFunctions pass, we could have emitted code for functions that
was discarded at the end due to size limitations. Since we didn't know
at the time of emission if the code would be discarded or not, we had to
emit jump tables in separate sections and handle them separately.
However, now we always run CheckLargeFunctions and make sure all emitted
code is used. Thus, we can get rid of the special jump table handling.
2024-12-13 13:14:02 -08:00
Tibor Dusnoki
5225f1b435
[BOLT][merge-fdata] Fix basic sample profile aggregation without LBR info (#118481)
When a basic sample profile is gathered without LBR info, the generated
profile contains a "no-lbr" tag in the first line of the fdata file.
This PR fixes merge-fdata to recognize and save this tag to the output
file.
2024-12-13 16:28:37 +00:00
Aiden Grossman
2bf3ef1847
[BOLT] Require non root user for unreadable-profile.test (#119816)
This patch adds a requirement for a non root user in
unreadable-profile.test. This test fails if run as a root user (like in
a container without explicitly changing the user), which can lead to
some CI test failures.
2024-12-12 22:14:41 -08:00
Kristof Beyls
ceb7214be0
[BOLT] Introduce binary analysis tool based on BOLT (#115330)
This initial commit does not add any specific binary analyses yet, it
merely contains the boilerplate to introduce a new BOLT-based tool.

This basically combines the 4 first patches from the prototype pac-ret
and stack-clash binary analyzer discussed in RFC
https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148
and published at
https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype

The introduction of such a BOLT-based binary analysis tool was proposed
and discussed in at least the following places:
- The RFC pointed to above
- EuroLLVM 2024 round table
https://discourse.llvm.org/t/summary-of-bolt-as-a-binary-analysis-tool-round-table-at-eurollvm/78441
The round table showed quite a few people interested in being able to
build a custom binary analysis quickly with a tool like this.
- Also at the US LLVM dev meeting a few weeks ago, I heard interest from
a few people, asking when the tool would be available upstream.
- The presentation "Adding Pointer Authentication ABI support for your
ELF platform"
(https://llvm.swoogo.com/2024devmtg/session/2512720/adding-pointer-authentication-abi-support-for-your-elf-platform)
explicitly mentioned interest to extend the prototype tool to verify
correct implementation of pauthabi.
2024-12-12 10:06:27 +00:00
Alexander Yermolovich
4b825c7417
[BOLT][DWARF] Add support for transitive DW_AT_name/DW_AT_linkage_name resolution for DW_AT_name/DW_AT_linkage_name. (#119493)
This fix handles a case where a DIE that does not have
DW_AT_name/DW_AT_linkage_name, but has a reference to another DIE using
DW_AT_abstract_origin/DW_AT_specification. It also fixes a bug where
there are cross CU references for those attributes. Previously it would
use a DWARF Unit of a DIE which was being processed The
warf5-debug-names-cross-cu.s test just happened to work because how it
was constructed where string section was shared by both DWARF Units.

To resolve DW_AT_name/DW_AT_linkage_name this patch iterates over
references until it either reaches the final DIE or finds both of those
names.
2024-12-11 14:27:56 -08:00
Peter Waller
14dcf8214f
[BOLT] Add --pad-funcs-before=func:n (#117924)
This complements --pad-funcs, and by using both simultaneously, enables
moving a specific function through the address space without modifying
any code
other than the targeted function (and references to it) by doing
(before+after=constant).

See also: proposed functionality to enable inserting random padding in

https://discourse.llvm.org/t/rfc-lld-feature-for-controlling-for-code-size-dependent-measurement-bias
and https://github.com/llvm/llvm-project/pull/117653
2024-12-11 09:58:52 +00:00
Alexander Yermolovich
50c0e679b9
[BOLT][DWARF] Add support for DW_TAG_union_type to DebugNames. (#119023)
Adding support for DW_TAG_union_type for DebugNames acceleration tables.
2024-12-06 15:45:52 -08:00
Jared Wyles
2ccf7ed277
[JITLink] Switch to SymbolStringPtr for Symbol names (#115796)
Use SymbolStringPtr for Symbol names in LinkGraph. This reduces string interning
on the boundary between JITLink and ORC, and allows pointer comparisons (rather
than string comparisons) between Symbol names. This should improve the
performance and readability of code that bridges between JITLink and ORC (e.g.
ObjectLinkingLayer and ObjectLinkingLayer::Plugins).

To enable use of SymbolStringPtr a std::shared_ptr<SymbolStringPool> is added to
LinkGraph and threaded through to its construction sites in LLVM and Bolt. All
LinkGraphs that are to have symbol names compared by pointer equality must point
to the same SymbolStringPool instance, which in ORC sessions should be the pool
attached to the ExecutionSession.
---------

Co-authored-by: Lang Hames <lhames@gmail.com>
2024-12-06 10:22:09 +11:00
Maksim Panchenko
d5956fb8f9
[BOLT][AArch64] Add support for short LLD thunks/veneers (#118422)
When a callee function is closer than 256MB from its call site, LLD
linker can strategically create a short thunk for the function with a
single branch instruction (that covers +/-128MB). Detect and convert
such thunks into direct calls in BOLT.
2024-12-03 13:44:51 -08:00
Paschalis Mpeis
51003076eb
Reapply [BOLT] DataAggregator support for binaries with multiple text segments (#118023)
When a binary has multiple text segments, the Size is computed as the
difference of the last address of these segments from the BaseAddress.
The base addresses of all text segments must be the same.

Introduces flag 'perf-script-events' for testing, which allows passing
perf events without BOLT having to parse them by invoking 'perf script'.
The flag is used to pass a mock perf profile that has two memory
mappings for a mock binary that has two text segments. The mapping
size is updated as `parseMMapEvents` now processes all text segments.
2024-12-02 09:20:40 +00:00
David Spickett
085e7d2b22
[bolt] Move CODE_OWNERS.txt to Maintainers.txt (#118082)
To align with: https://llvm.org/docs/DeveloperPolicy.html#maintainers

I have not changed the format of the file, my only goal here is that the
project have a `bolt/Maintainers.*` so it is easy to find.
2024-12-02 09:12:57 +00:00
Peter Waller
b5ed375f9d
[BOLT] Skip _init; avoiding GOT breakage for static binaries (#117751)
_init is used during startup of binaires. Unfortunately, its
address can be shared (at least on AArch64 glibc static binaries) with a
data
reference that lives in the GOT. The GOT rewriting is currently unable
to distinguish between data addresses and function addresses. This leads
to the data address being incorrectly rewritten, causing a crash on
startup of the binary:

  Unexpected reloc type in static binary.

To avoid this, don't consider _init for being moved, by skipping it.

~We could add further conditions to narrow the skipped case for known
crashes, but as a straw man I thought it'd be best to keep the condition
as simple as possible and see if there any objections to this.~
(Edit: this broke the test
bolt/test/runtime/X86/retpoline-synthetic.test,
because _init was skipped from the retpoline pass and it has an indirect
call in it, so I include a check for static binaries now, which avoids
the test failure,
but perhaps this could/should be narrowed further?)

For now, skip _init for static binaries on any architecture; we could
add further conditions to narrow the skipped case for known crashes, but
as a straw man I thought it'd be best to keep the condition as simple as
possible and see if there any objections to this.

Updates #100096.
2024-11-28 14:59:07 +00:00
Sander de Smalen
318c69de52 Reland "[AArch64] Define high bits of FPR and GPR registers (take 2) (#114827)"
The issue with slow compile-time was caused by an assert in
AArch64RegisterInfo.cpp. The assert invokes 'checkAllSuperRegsMarked'
after adding all the reserved registers. This call gets very expensive
after adding the _HI registers due to the way the function searches
in the 'Exception' list, which is expected to be a small list but isn't
(the patch added 190 _HI regs).

It was possible to rewrite the code in such a way that the _HI registers
are marked as reserved after the check. This makes the problem go away
entirely and restores compile-time to what it was before (tested for
`check-runtimes`, which previously showed a ~5x slowdown).

This reverts commits:
  1434d2ab215e3ea9c5f34689d056edd3d4423a78
  2704647fb7986673b89cef1def729e3b022e2607
2024-11-27 13:31:59 +00:00
Enna1
4d2bc0adc6
[BOLT] Extract comparator for sorting functions by index into helper function (#116217)
This change extracts the comparator for sorting functions by index into
a helper function `compareBinaryFunctionByIndex()`

Not sure why the comparator used in
`BinaryContext::getSortedFunctions()` is not same as the other two
places. I think they should use the same comparator, so I also change
`BinaryContext::getSortedFunctions()` to use
`compareBinaryFunctionByIndex()` for sorting functions.
2024-11-27 09:01:12 +08:00
Raul Tambre
003b48e0cb
[BOLT][test] enable GNU extensions, use C++ compiler, remove unnecessary target (#117043)
1. With a Clang that doesn't default to GNU extensions they need to be enabled explicitly.
2. The X86 directory lit config sets it already, there's no reason for this test to do it by itself.
3. The C frontend executable will fail if there's for example a Clang resource file for the C++ mode that sets C++-specific options:
```
+ /home/tambre/dev/llvm/build/bin/clang --target=x86_64-unknown-linux-gnu -fPIE -fuse-ld=lld -Wl,--unresolved-symbols=ignore-all -pie -fPIC -shared /home/tambre/dev/llvm/bolt/test/R_ABS.pic.lld.cpp -o /home/tambre/dev/llvm/build/tools/bolt/test/Output/R_ABS.pic.lld.cpp.tmp.so -Wl,-q -fuse-ld=lld
clang: warning: argument unused during compilation: '-pie' [-Wunused-command-line-argument]
error: invalid argument '-std=c23' not allowed with 'C++'
```
2024-11-27 00:14:00 +02:00
Hans Wennborg
537343dea4 Revert "[BOLT] DataAggregator support for binaries with multiple text segments (#92815)"
This caused test failures, see comment on the PR:

  Failed Tests (2):
    BOLT-Unit :: Core/./CoreTests/AArch64/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0
    BOLT-Unit :: Core/./CoreTests/X86/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0

> When a binary has multiple text segments, the Size is computed as the
> difference of the last address of these segments from the BaseAddress.
> The base addresses of all text segments must be the same.
>
> Introduces flag 'perf-script-events' for testing. It allows passing perf events
> without BOLT having to parse them using 'perf script'. The flag is used to
> pass a mock perf profile that has two memory mappings for a mock binary
> that has two text segments. The size of the mapping is updated as this
> change `parseMMapEvents` processes all text segments.

This reverts commit 4b71b3782d217db0138b701c4514bd2168ca1659.
2024-11-26 14:59:30 +01:00
Paschalis Mpeis
957c2ac4f1
[BOLT] Fix for bughunter.sh in offline mode (#116649)
In offline mode, the script sets 'PASS' variable and does not use it.
Surrounding code suggests using 'FAIL' variable instead.
2024-11-25 13:13:10 +00:00
Paschalis Mpeis
4b71b3782d
[BOLT] DataAggregator support for binaries with multiple text segments (#92815)
When a binary has multiple text segments, the Size is computed as the
difference of the last address of these segments from the BaseAddress.
The base addresses of all text segments must be the same.

Introduces flag 'perf-script-events' for testing. It allows passing perf events
without BOLT having to parse them using 'perf script'. The flag is used to
pass a mock perf profile that has two memory mappings for a mock binary
that has two text segments. The size of the mapping is updated as this
change `parseMMapEvents` processes all text segments.
2024-11-25 13:12:43 +00:00
Maksim Panchenko
2704647fb7 Revert "Fix up MCPlusBuilder.cpp to account for W0_HI on AArch64"
This reverts commit 576865a50e6ccb74196c9491fa79575d6d7f0b0b.

Depends on #114827 that was reverted.
2024-11-22 13:57:30 -08:00
Maksim Panchenko
92301180f7
[BOLT] Use compact EH format for fixed-address executables (#117274)
Use ULEB128 format for emitting LSDAs for fixed-address executables,
similar to what we use for PIEs/DSOs. Main difference is that we don't
use landing pad trampolines when landing pads are not contained in a
single fragment. Instead, we fallback to emitting larger fixed-address
LSDAs, which is still better than adding trampoline instructions.
2024-11-22 00:28:55 -08:00
Maksim Panchenko
105ecd8bb2
[BOLT] Avoid EH trampolines for PIEs/DSOs (#117106)
We used to emit EH trampolines for PIE/DSO whenever a function fragment
contained a landing pad outside of it. However, it is common to have all
landing pads in a cold fragment even when their throwers are in a hot
one.

To reduce the number of trampolines, analyze landing pads for any given
function fragment, and if they all belong to the same (possibly
different) fragment, designate that fragment as a landing pad fragment
for the "thrower" fragment. Later, emit landing pad fragment symbol as
an LPStart for the thrower LSDA.
2024-11-21 18:18:30 -08:00
Maksim Panchenko
3282be1f8d
[BOLT] Use ULEB128 encoding for PIE/DSO exception tables (#116911)
Use ULEB128 encoding for call sites in PIE/DSO binaries. The encoding
reduces the size of the tables compared to sdata4 and is the default
format used by Clang.

Note that for fixed-address executables we still use absolute addressing
to cover cases where landing pads can reside in different function
fragments.

For testing, we rely on runtime EH tests.
2024-11-20 12:29:23 -08:00
Maksim Panchenko
066dd91ad8
[BOLT] Offset LPStart to avoid unnecessary instructions (#116713)
For C++ exception handling, when we write a call site table, we must
avoid emitting 0-value offsets for landing pads unless the call site has
no landing pad. However, 0 can be a real offset from the start of the
FDE if the FDE corresponds to a function fragment that starts with a
landing pad. In such cases, we used to emit a trap instruction at the
start of the fragment to guarantee non-zero LP offset.

To avoid emitting unnecessary trap instructions, we can instead set
LPStart to an offset from the FDE. If we emit it as [FDEStart - 1], then
all real offsets from LPStart in FDE become non-negative.
2024-11-19 16:45:03 -08:00