453 Commits

Author SHA1 Message Date
Amir Ayupov
6735ce9d25 [BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)
Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it set.

Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
2024-01-18 20:00:47 -08:00
Amir Ayupov
9fec33aadc Revert "[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)"
This reverts commit 82bc33ea3f1a539be50ed46919dc53fc6b685da9.

Accidentally pushed unrelated changes.
2024-01-18 19:59:09 -08:00
Amir Ayupov
82bc33ea3f
[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)
Fix the bug where merge-fdata unconditionally outputs boltedcollection 
line, regardless of whether input files have it set.

Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
2024-01-18 19:44:16 -08:00
Amir Ayupov
dcba077146
[BOLT] Embed cold mapping info into function entry in BAT (#76903)
Reduces BAT section size:
- large binary: to 12283500 bytes (0.32x original size),
- medium binary: to 1616020 bytes (0.27x original size),
- small binary: to 404 bytes (0.28x original size).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-12 13:02:32 -08:00
Amir Ayupov
8fb8ad66c9
[BOLT] Delta-encode function start addresses in BAT (#76902)
Further reduce the size of BAT section:
- large binary: to 12716312 bytes (0.33x original),
- medium binary: to 1649472 bytes (0.28x original),
- small binary: to 428 bytes (0.30x original).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-11 14:35:37 -08:00
Amir Ayupov
bbe07989d7
[BOLT] Delta-encode offsets in BAT (#76900)
This change further reduces the size of BAT:
- large binary: to 13073904 bytes (0.34x original),
- medium binary: to 1703116 bytes (0.29x original),
- small binary: to 436 bytes (0.30x original).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-11 14:29:46 -08:00
Amir Ayupov
565f40d66b [BOLT] Encode BAT using ULEB128 (#76899)
Reduces BAT section size, bytes:
- large binary: 38676872 -> 23262524 (0.60x),
- medium binary (trunk clang): 5938004 -> 3213504 (0.54x),
- small binary (X86/bolt-address-translation.test): 1436 -> 680 (0.47x).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-11 12:16:30 -08:00
Amir Ayupov
2bb511e277
[BOLT][NFC] Print BAT section size (#76897)
Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-11 11:04:04 -08:00
Min-Yih Hsu
23e03a85dc [BOLT] Update test case after #77253
PR #77253 removed the '@plt' suffix from callee symbols. Update
RISCV/relax.s accordingly.
2024-01-08 11:05:38 -08:00
ShatianWang
1577483413
[BOLT] Don't split likely fallthrough in CDSplit (#76164)
This diff speeds up CDSplit by not considering any hot-warm splitting
point that could break a fall-through branch from a basic block to its
most likely successor.

Co-authored-by: spupyrev <spupyrev@fb.com>
2023-12-21 16:17:10 -05:00
Jon Roelofs
d6f772074c
fixup! fixup! [GlobalISel] Always direct-call IFuncs and Aliases (#74902)
Apparently some BOLT bots build with a pre-installed system clang, and others
use the just-built one. These two clangs now behave slightly differently when
it comes to ifunc codegen after https://github.com/llvm/llvm-project/pull/74902

Change the test to accept both patterns.
2023-12-15 12:48:11 -07:00
Jon Roelofs
3017adb37e
fixup! [GlobalISel] Always direct-call IFuncs and Aliases (#74902)
The codegen change broke one of the BOLT tests.
2023-12-15 12:17:07 -07:00
Wang Yaduo
c532ba4edd [RISCV] Support printing immediate of RISCV MCInst in hexadecimal format (#74053)
Enable the llvm-objdump to disassemble the immediate of RISCV
instruction in hexadecimal format with --print-imm-hex flag.
2023-12-14 22:42:11 -08:00
Vitaly Buka
fc3adf74d3
Revert "[RISCV] Support printing immediate of RISCV MCInst in hexadecimal format" (#75561)
Reverts llvm/llvm-project#74053

Breaks https://lab.llvm.org/buildbot/#/builders/5/builds/39291

Co-authored-by: Wang Yaduo <wangyaduo@linux.alibaba.com>

Issue #75563
2023-12-14 22:05:47 -08:00
Wang Yaduo
3dde0d0256
[RISCV] Support printing immediate of RISCV MCInst in hexadecimal format (#74053)
Enable the llvm-objdump to disassemble the immediate of RISCV
instruction in hexadecimal format with --print-imm-hex flag.
2023-12-15 10:13:20 +08:00
Alexander Yermolovich
bf2b035e58
[BOLT][DWARF] Fix handling .debug_str_offsets for type units (#75522)
There was an assumpiton that TUs and CUs share .debug_str_offsets
contribution. For ThinLTO builds it is not the case. Changed so that we
parse contributions for TUs also, and did some refactoring so that we
don't re-parse contributions that were not modified.
2023-12-14 17:27:21 -08:00
Rafael Auler
a26aa79a3b
[BOLT] Fix some dwarf tests affected by 75095 (#75327)
PR 75095 introduced some changes to lld that broke some dwarf tests that
were being incorrectly linked as a PIE. Add flags to disable any PIC/PIE
compilation, so the linker can succeed and the tests can run as
intended.
2023-12-13 06:11:15 -08:00
Alexander Yermolovich
fb9a851224
[BOLT][DWARF] Fix handling of debug_str_offsets (#75100)
We were not setting size field of .debug_str_offsets correctly. Fixed
it, and added a test.
2023-12-11 15:56:32 -08:00
Amir Ayupov
b039ccc684
[BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253)
Provide backwards compatibility for YAML profile that uses `std::hash`:
xxh3 hash is the default for newly produced profile (sets `std-hash:
false`),
whereas the profile that doesn't specify `std-hash` will be treated as
`std-hash: true`, preserving old behavior.
2023-12-11 12:27:32 -08:00
sinan
b304873134
[BOLT] Fix a wrong compiler option in test (#74420)
-nopie is an option for OpenBSD, and other linux distribution might
report an `unsupported option '-nopie' for target` error.
2023-12-06 17:16:48 +08:00
eleviant
f20af7372f
[bolt] Support arm64 FP register spills (#73021)
At the moment llvm-bolt fails when analyzing jump tables on aarch64 in
case FP register spill/reload is used.
2023-12-05 20:32:58 +01:00
ShatianWang
4483cf2d8b
[BOLT] CDSplit main logic part 2/2 (#74032)
This diff implements the main splitting logic of CDSplit. CDSplit
processes functions in a binary in parallel. For each function BF, it
assumes that all other functions are hot-cold split. For each possible
hot-warm split point of BF, it computes its corresponding SplitScore,
and chooses the split point with the best SplitScore. The SplitScore of
each split point is computed in the following way: each call edge or
jump edge has an edge score that is proportional to its execution count,
and inversely proportional to its distance. The SplitScore of a split
point is a sum of edge scores over a fixed set of edges whose distance
can change due to hot-warm splitting BF. This set contains all cover
calls in the form of X->Y or Y->X given function order [... X ... BF ...
Y ...]; we refer to the sum of edge scores over the set of cover calls
as CoverCallScore. This set also contains all jump edges (branches)
within BF as well as all call edges originated from BF; we refer to the
sum of edge scores over this set of edges as LocalScore. CDSplit finds
the split index maximizing CoverCallScore + LocalScore.
2023-11-30 23:17:11 -05:00
Alexander Yermolovich
52be47b890
[BOLT][DWARF] Add support to create path (#73884)
When option --dwarf-output-path is specified, if the path does not exist
BOLT will now create it. This is what also happens when
--plugin-opt=dwo_dir=<value> is specified to LLD.
2023-11-30 09:41:01 -08:00
Maksim Panchenko
0acfe8483a
[BOLT][DWARF] Fix output ranges for deleted code (#73464)
Set range low_pc to 0 for DIEs that correspond to deleted code.

Fixes #73428
2023-11-28 22:40:53 -08:00
Alexander Yermolovich
b47b3bee7b
[BOLT][DWARF] Fix handling of DWARF5 DWP (#72729)
Fixed handling of DWP as input. Before BOLT crashed. Now it will write
out
correct CU, and all the TUs. Potential future improvement is to scan all
the TUs
used in this CU, and only include those.
2023-11-28 15:54:14 -08:00
Amir Ayupov
af4d8d5af6
[BOLT][test] Update perf2bolt/perf_test.test (#73482) 2023-11-28 07:00:07 -08:00
spupyrev
e7dd596c68
[BOLT] Use deterministic xxh3 for computing BF/BB hashes (#72542)
std::hash and ADT/Hashing::hash_value are non-deterministic functions
whose
results might vary across implementation/process/execution. Using xxh3
instead
for computing hashes of BinaryFunctions and BinaryBasicBlock for stale
profile
matching.
(A possible alternative is to use ADT/StableHashing.h based on FNV
hashing but
xxh3 seems to be more popular in LLVM)

This is to address https://github.com/llvm/llvm-project/issues/65241.
2023-11-27 14:45:46 -08:00
Amir Ayupov
ab14eb23b6
[BOLT][test] Replace /dev/null with temp file (#73485)
NFC processing time script identifies tests by output filename.
When `/dev/null` is used as output filename, we're unable to tell the
source test, and the reports are unhelpful.
Replace `/dev/null/` with `%t.null` which resolves the issue.
2023-11-27 10:53:18 -08:00
ShatianWang
d333c0e062
[BOLT] Extend calculateEmittedSize() for block size calculation (#73076)
This commit modifies BinaryContext::calculateEmittedSize() to update 
the BinaryBasicBlock::OutputAddressRange of each basic block in the
function in place. BinaryBasicBlock::getOutputSize() now gives the 
emitted size of the basic block.
2023-11-23 15:28:31 -05:00
Maksim Panchenko
84602066a6
[BOLT] Fix C++ exceptions when LPStart is specified (#72737)
Whenever LPStartEncoding was different from DW_EH_PE_omit, we used to
miscalculate LPStart. As a result, landing pads were assigned wrong
addresses. Fix that.
2023-11-20 20:55:38 -08:00
Maksim Panchenko
445f6f1373
[BOLT][TEST] Remove LTO flag from a test (#72896)
The LTO flag is not needed for the test to work properly. However, it
may not build on a system where compiler and linker versions don't match
one another. Remove the LTO flag.
2023-11-20 10:24:34 -08:00
JohnLee1243
ae51ec84bb
[Bolt] Solving pie support issue (#65494)
Now PIE is default supported after clang 14. It cause parsing error when
using perf2bolt. The reason is the base address can not get correctly.
Fix the method of geting base address. If SegInfo.Alignment is not equal
to pagesize, alignDown(SegInfo.FileOffset, SegInfo.Alignment) can not
equal to FileOffset. So the SegInfo.FileOffset and FileOffset should be
aligned by SegInfo.Alignment first and then judge whether they are
equal.
The .text segment's offset from base address in VAS is aligned by
pagesize. So MMapAddress's offset from base address is
alignDown(SegInfo.Address, pagesize) instead of
alignDown(SegInfo.Address, SegInfo.Alignment). So the base address
calculate way should be changed.

Co-authored-by: Li Zhuohang <lizhuohang3@huawei.com>
2023-11-16 15:05:06 +08:00
Vladislav Khmelevsky
c5a306f07e
[BOLT] Fix LSDA section handling (#71821)
Currently BOLT finds LSDA secition by it's name .gcc_except_table.main .
But sometimes it might have suffix e.g. .gcc_except_table.main. Find
LSDA section by it's address, rather by it's name.
Fixes #71804
2023-11-15 23:21:50 +04:00
Maksim Panchenko
f633f325a1
[BOLT] Fix NOP instruction emission on x86 (#72186)
Use MCAsmBackend::writeNopData() interface to emit NOP instructions on
x86. There are multiple forms of NOP instruction on x86 with different
sizes. Currently, LLVM's assembly/disassembly does not support all forms
correctly which can lead to a breakage of input code semantics, e.g. if
the program relies on NOP instructions for reserving a patch space.

Add "--keep-nops" option to preserve NOP instructions.
2023-11-13 18:12:39 -08:00
Alexander Yermolovich
ce17c6d3ba
[BOLT][DWARF] Fix --dwarf-output-path (#71886)
Fixed a bug where when --dwarf-output-path is specified
and DW_AT_dwo_name contains part of the path the output path would
contain both.
Which lead to llvm-bolt crash, because the path didn't exist.
Example:
llvm-bolt .... --dwarf-output-path=/some/path/ 
DW_AT_dwo_name  ("objects/o1/split.dwo")

It would try to write .dwo file to /some/path/objects/o1/split.dwo.dwo
instead of to
/some/path/split.dwo.dwo
2023-11-10 13:18:57 -08:00
Vladislav Khmelevsky
cf18f142c0
[BOLT] Read .rela.dyn in static non-pie binary (#71635)
Static non-pie binary doesn't have DYNAMIC segment and BOLT skips
reading .rela.dyn section because of it. But such binaries might have
this section for example to store IFUNC relocation which is resolved
by linked-in startup files, so force reading this section for static
executables.
2023-11-10 11:47:12 +04:00
Vladislav Khmelevsky
abec50cb93
[BOLT][AArch64] Fix strict usage during ADR Relax (#71377)
Currently strict mode is used to expand number of optimized functions,
not to shrink it. Revert the option usage in the pass, so passing strict
option would relax adr instruction even if there are no nops around it.
Also add check for nop after adr instruction.
2023-11-10 11:46:36 +04:00
spaette
1a2f83366b
[BOLT] Fix typos (#68121)
Closes https://github.com/llvm/llvm-project/issues/63097

Before merging please make sure the change to
bolt/include/bolt/Passes/StokeInfo.h is correct.

bolt/include/bolt/Passes/StokeInfo.h

```diff
  //  This Pass solves the two major problems to use the Stoke program without
- //  proting its code:
+ //  probing its code:
```

I'm still not happy about the awkward wording in this comment.

bolt/include/bolt/Passes/FixRelaxationPass.h

```
$ ed -s bolt/include/bolt/Passes/FixRelaxationPass.h <<<'9,12p'
// This file declares the FixRelaxations class, which locates instructions with
// wrong targets and fixes them. Such problems usually occures when linker
// relaxes (changes) instructions, but doesn't fix relocations types properly
// for them.
$
```


bolt/docs/doxygen.cfg.in
bolt/include/bolt/Core/BinaryContext.h
bolt/include/bolt/Core/BinaryFunction.h
bolt/include/bolt/Core/BinarySection.h
bolt/include/bolt/Core/DebugData.h
bolt/include/bolt/Core/DynoStats.h
bolt/include/bolt/Core/Exceptions.h
bolt/include/bolt/Core/MCPlusBuilder.h
bolt/include/bolt/Core/Relocation.h
bolt/include/bolt/Passes/FixRelaxationPass.h
bolt/include/bolt/Passes/InstrumentationSummary.h
bolt/include/bolt/Passes/ReorderAlgorithm.h
bolt/include/bolt/Passes/StackReachingUses.h
bolt/include/bolt/Passes/StokeInfo.h
bolt/include/bolt/Passes/TailDuplication.h
bolt/include/bolt/Profile/DataAggregator.h
bolt/include/bolt/Profile/DataReader.h
bolt/lib/Core/BinaryContext.cpp
bolt/lib/Core/BinarySection.cpp
bolt/lib/Core/DebugData.cpp
bolt/lib/Core/DynoStats.cpp
bolt/lib/Core/Relocation.cpp
bolt/lib/Passes/Instrumentation.cpp
bolt/lib/Passes/JTFootprintReduction.cpp
bolt/lib/Passes/ReorderData.cpp
bolt/lib/Passes/RetpolineInsertion.cpp
bolt/lib/Passes/ShrinkWrapping.cpp
bolt/lib/Passes/TailDuplication.cpp
bolt/lib/Rewrite/BoltDiff.cpp
bolt/lib/Rewrite/DWARFRewriter.cpp
bolt/lib/Rewrite/RewriteInstance.cpp
bolt/lib/Utils/CommandLineOpts.cpp
bolt/runtime/instr.cpp
bolt/test/AArch64/got-ld64-relaxation.test
bolt/test/AArch64/unmarked-data.test
bolt/test/X86/Inputs/dwarf5-cu-no-debug-addr-helper.s
bolt/test/X86/Inputs/linenumber.cpp
bolt/test/X86/double-jump.test
bolt/test/X86/dwarf5-call-pc-function-null-check.test
bolt/test/X86/dwarf5-split-dwarf4-monolithic.test
bolt/test/X86/dynrelocs.s
bolt/test/X86/fallthrough-to-noop.test
bolt/test/X86/tail-duplication-cache.s
bolt/test/runtime/X86/instrumentation-ind-calls.s
2023-11-09 11:29:46 -08:00
Maksim Panchenko
11f52f783a
[BOLT][DWARF] Fix invalid address ranges (#71474)
When NOP instructions are removed by BOLT and a DWARF address range
falls past the removed instructions, it may lead to invalid DWARF ranges
in the output binary. E.g. the range may fall outside of the basic block
boundaries.

This fix makes sure the modified range fits within the containing basic
block. A proper fix requires tracking instructions within the block and
will come in a different PR.
2023-11-09 09:55:49 -08:00
Job Noorman
c4b096a343 [BOLT] Fix typo in test 2023-11-09 09:14:27 +01:00
Rafael Auler
4c9f6d6f02
[BOLT][AArch64] Fix ifuncs test header inclusion (#71741)
Summary: Do not include stdlib headers as these tests are built with
-nostdlib. Tests outside of runtime folder also run cross-platforms, so
an x86 machine wouldn't have access to the correct headers used in the
aarch64 toolchain, even if it has an aarch64 compiler (clang itself).
2023-11-08 16:42:21 -08:00
Job Noorman
96b5e092dc
[BOLT] Support instrumentation hook via DT_FINI_ARRAY (#67348)
BOLT currently hooks its its instrumentation finalization function via
`DT_FINI`. However, this method of calling finalization routines is not
supported anymore on newer ABIs like RISC-V. `DT_FINI_ARRAY` is
preferred there.

This patch adds support for hooking into `DT_FINI_ARRAY` instead if the
binary does not have a `DT_FINI` entry. If it does, `DT_FINI` takes
precedence so this patch should not change how the currently supported
instrumentation targets behave.

`DT_FINI_ARRAY` points to an array in memory of `DT_FINI_ARRAYSZ` bytes.
It consists of pointer-length entries that contain the addresses of
finalization functions. However, the addresses are only filled-in by the
dynamic linker at load time using relative relocations. This makes
hooking via `DT_FINI_ARRAY` a bit more complicated than via `DT_FINI`.

The implementation works as follows:
- While scanning the binary: find the section where `DT_FINI_ARRAY`
points to, read its first dynamic relocation and use its addend to find
the address of the fini function we will use to hook;
- While writing the output file: overwrite the addend of the dynamic
relocation with the address of the runtime library's fini function.

Updating the dynamic relocation required a bit of boiler plate: since
dynamic relocations are stored in a `std::multiset` which doesn't
support getting mutable references to its items, functions were added to
`BinarySection` to take an existing relocation and insert a new one.
2023-11-08 11:01:10 +00:00
Vladislav Khmelevsky
e2f1a95f2a
[BOLT][AArch64] Handle IFUNCS properly (#71104)
Currently we were testing only the binaries compiled with O0, which
results in indirect call to the IFUNC trampoline and the trampoline has
associated IFUNC symbol with it. Compile with O3 results in direct
calling the IFUNC trampoline and no symbols are associated with it, the
IFUNC symbol address becomes the same as IFUNC resolver address. Since
no symbol was associated the BF was not created before PLT analyze and
be the algorithm we're going to analyze target relocation. As we're
expecting the JUMP relocation we're also expecting the associated symbol
with it to be presented. But for IFUNC relocation the IRELATIVE
relocation is used and no symbol is associated with it, the addend value
is pointing on the target symbol, so we need to find BF using it and use
it's symbol in this situation. Currently this is checked only for
AArch64 platform, so I've limited it in code to use this logic only for
this platform, although I wouldn't be surprised if other platforms needs
to activate this logic too.
2023-11-08 11:41:43 +04:00
Vladislav Khmelevsky
485075c095
[BOLT][AArch64] Don't change layout in PatchEntries (#71278)
Due to LongJmp pass that is executed before PatchEntries we can't ignore
the function here since it would change pre-calculated output layout.
The test reloc-26 relied on the wrong behavior, rewritten to unittest.
This is also attemp to fix #70771
2023-11-08 11:38:46 +04:00
maksfb
7f031d1c7c
[BOLT] Fix address mapping for ICP code (#70136)
When we create new code for indirect code promotion optimization, we
should mark it as originating from the indirect jump instruction for
BOLT address translation (BAT) to map it to the original instruction.
2023-11-06 11:25:49 -08:00
J. Ryan Stinnett
d5e33cc147
[DebugInfo] Use human-friendly printing for DWARF column attributes (#71062) 2023-11-04 17:08:42 +00:00
Vladislav Khmelevsky
888742a121
[BOLT][AArch64] Handle .plt.got section (#71216)
It seems that currently this section is only created by the mold linker
if 2 conditions are met: 1. The PLT function was called directly. 2. The
indirect access to PLT function was found (e.g. through ADRP
relocation). Although mold created symbol for every plt entry I've
removed them in yaml file to check that .plt.got was truly disassembled
by bolt.
2023-11-04 00:47:24 +04:00
maksfb
8244ff6739
[BOLT] Fix incorrect basic block output addresses (#70000)
Some optimization passes may duplicate basic blocks and assign the same
input offset to a number of different blocks in a function. This is done
e.g. to correctly map debugging ranges for duplicated code.

However, duplicate input offsets present a problem when we use
AddressMap to generate new addresses for basic blocks. The output
address is calculated based on the input offset and will be the same for
blocks with identical offsets. The result is potentially incorrect debug
info and BAT records.

To address the issue, we have to eliminate the dependency on input
offsets while generating output addresses for a basic block. Each block
has a unique label, hence we extend AddressMap to include address lookup
based on MCSymbol and use the new functionality to update block
addresses.
2023-10-24 12:22:43 -07:00
Job Noorman
b6b492880f
[BOLT][RISCV] Set minimum function alignment to 2 for RVC (#69837)
In #67707, the minimum function alignment on RISC-V was set to 4. When
RVC (compressed instructions) is enabled, the minimum alignment can be
reduced to 2.

This patch implements this by delegating the choice of minimum alignment
to a new `MCPlusBuilder::getMinFunctionAlignment` function. This way,
the target-dependent code in `BinaryFunction` is minimized.
2023-10-23 08:09:11 +00:00
Job Noorman
86bc486785
[BOLT][RISCV] Use target features from object file (#69836)
We used to hard-code target features for RISC-V. However, most features
(with the exception of relax) are stored in the object file. This patch
extracts those features to ensure BOLT's output doesn't use any features
not present in the input file.
2023-10-23 06:40:25 +00:00