68 Commits

Author SHA1 Message Date
chrisPyr
038fff3f24
[NFC][BOLT] Make file-local cl::opt global variables static (#126472)
#125983
2025-03-05 22:11:05 -08:00
Nikita Popov
e235fcb582
[BOLT] Only link and initialize supported targets (#127509)
Bolt currently links and initializes all LLVM targets. This
substantially increases the binary size, and link time if LTO is used.

Instead, only link the targets specified by BOLT_TARGETS_TO_BUILD. We
also have to only initialize those targets, so generate a
TargetConfig.def file with the necessary information. The way the
initialization is done mirrors what llvm-exegesis does.

This reduces llvm-bolt size from 137MB to 78MB for me.
2025-02-18 09:17:51 +01:00
Nikita Popov
0abe058d7f
[BOLT] Use getMainExecutable() (#126698)
Use LLVM's getMainExecutable() helper instead of rolling our own. This
will result in standard behavior across platforms, such as making sure
that symlinks are always resolved.
2025-02-12 09:44:26 +01:00
Amir Ayupov
8652608404
[BOLT] Fix counts aggregation in merge-fdata (#119652)
merge-fdata used to consider misprediction count as part of "signature",
or the aggregation key. This prevented it from collapsing profile lines
with different misprediction counts, which resulted in duplicate
`(from, to)` pairs with different misprediction and execution counts.

Fix that by splitting out misprediction count and accumulating it
separately.

Test Plan: updated bolt/test/merge-fdata-lbr-mode.test
2024-12-14 22:38:24 -08:00
Amir Ayupov
97f43364cc
[BOLT][NFC] Speedup merge-fdata (#119942)
Eliminate splitting the buffer into lines, and use `std::getline`
directly. Simplify no_lbr and boltedcollection handling as well.

Test Plan: For a large fdata file (200MB), fstream version is ~10%
faster.
2024-12-14 22:26:20 -08:00
Tibor Dusnoki
5225f1b435
[BOLT][merge-fdata] Fix basic sample profile aggregation without LBR info (#118481)
When a basic sample profile is gathered without LBR info, the generated
profile contains a "no-lbr" tag in the first line of the fdata file.
This PR fixes merge-fdata to recognize and save this tag to the output
file.
2024-12-13 16:28:37 +00:00
Kristof Beyls
ceb7214be0
[BOLT] Introduce binary analysis tool based on BOLT (#115330)
This initial commit does not add any specific binary analyses yet, it
merely contains the boilerplate to introduce a new BOLT-based tool.

This basically combines the 4 first patches from the prototype pac-ret
and stack-clash binary analyzer discussed in RFC
https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148
and published at
https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype

The introduction of such a BOLT-based binary analysis tool was proposed
and discussed in at least the following places:
- The RFC pointed to above
- EuroLLVM 2024 round table
https://discourse.llvm.org/t/summary-of-bolt-as-a-binary-analysis-tool-round-table-at-eurollvm/78441
The round table showed quite a few people interested in being able to
build a custom binary analysis quickly with a tool like this.
- Also at the US LLVM dev meeting a few weeks ago, I heard interest from
a few people, asking when the tool would be available upstream.
- The presentation "Adding Pointer Authentication ABI support for your
ELF platform"
(https://llvm.swoogo.com/2024devmtg/session/2512720/adding-pointer-authentication-abi-support-for-your-elf-platform)
explicitly mentioned interest to extend the prototype tool to verify
correct implementation of pauthabi.
2024-12-12 10:06:27 +00:00
Amir Ayupov
6ee5ff95ab
[BOLT] Add profile density computation
Reuse the definition of profile density from llvm-profgen (#92144):
- the density is computed in perf2bolt using raw samples (perf.data or
  pre-aggregated data),
- function density is the ratio of dynamically executed function bytes
  to the static function size in bytes,
- profile density:
  - functions are sorted by density in decreasing order, accumulating
    their respective sample counts,
  - profile density is the smallest density covering 99% of total sample
    count.

In other words, BOLT binary profile density is the minimum amount of
profile information per function (excluding functions in tail 1% sample
count) which is sufficient to optimize the binary well.

The density threshold of 60 was determined through experiments with
large binaries by reducing the sample count and checking resulting
profile density and performance. The threshold is conservative.

perf2bolt would print the warning if the density is below the threshold
and suggest to increase the sampling duration and/or frequency to reach
a given density, e.g.:
```
BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples.
```

Test Plan: updated pre-aggregated-perf.test

Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe

Reviewed By: WenleiHe, wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/101094
2024-10-24 18:30:59 -07:00
Amir Ayupov
3c4f00905e
[BOLT] Support perf2bolt-N in the driver
Check invoked tool with `starts_with`.

Addresses the issue where `perf2bolt` invoked using a distro symlink
`perf2bolt-16` fails to run in perf2bolt mode and runs in llvm-bolt mode
instead.

The issue is mentioned in https://vondra.me/posts/playing-with-bolt-and-postgres/

Test Plan:
```
ln -sf perf2bolt perf2bolt-20
perf2bolt-20 clang -p perf.data -o fdata.clang -w yaml.clang
...
PERF2BOLT: wrote 188593 objects and 0 memory objects to fdata.clang
```

Reviewers: ayermolo, rafaelauler, dcci, maksfb

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/111072
2024-10-14 10:17:31 -07:00
Maksim Panchenko
6fb39ac77b
[BOLT][merge-fdata] Initialize YAML profile header (#109613)
While merging profiles, some fields in the input header, e.g.
HashFunction, could be uninitialized leading to a UMR. Initialize merged
header with the first input header.

Fixes #109592
2024-09-25 23:18:34 +02:00
Amir Ayupov
15fa3ba547
[BOLT][YAML] Allow unknown keys in the input (#100824)
This ensures forward compatibility, where old BOLT versions can consume
the profile created by newer versions with extra keys.

Test Plan: added yaml-unknown-keys.test
2024-09-03 11:27:57 -07:00
Amir Ayupov
fd38366e45
[BOLT][NFC] Clean includes, add license headers (#87200) 2024-03-31 19:29:45 -07:00
Mehdi Amini
716042a63f
Rename llvm::ThreadPool -> llvm::DefaultThreadPool (NFC) (#83702)
The base class llvm::ThreadPoolInterface will be renamed
llvm::ThreadPool in a subsequent commit.

This is a breaking change: clients who use to create a ThreadPool must
now create a DefaultThreadPool instead.
2024-03-05 18:00:46 -08:00
Mehdi Amini
744616b3ae
Rename ThreadPool::getThreadCount() to getMaxConcurrency() (NFC) (#82296)
This is addressing a long-time TODO to rename this misleading API. The
old one is preserved for now but marked deprecated.
2024-02-19 18:07:12 -08:00
Amir Ayupov
52cf07116b
[BOLT][NFC] Log through JournalingStreams (#81524)
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.

In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.

Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC
2024-02-12 14:53:53 -08:00
Amir Ayupov
6735ce9d25 [BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)
Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it set.

Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
2024-01-18 20:00:47 -08:00
Amir Ayupov
9fec33aadc Revert "[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)"
This reverts commit 82bc33ea3f1a539be50ed46919dc53fc6b685da9.

Accidentally pushed unrelated changes.
2024-01-18 19:59:09 -08:00
Amir Ayupov
82bc33ea3f
[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)
Fix the bug where merge-fdata unconditionally outputs boltedcollection 
line, regardless of whether input files have it set.

Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
2024-01-18 19:44:16 -08:00
Kazu Hirata
6da4a7a8e2 [BOLT] Use SmallString::operator std::string (NFC) 2024-01-15 21:59:06 -08:00
Kazu Hirata
ad8fd5b185 [BOLT] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 23:34:49 -08:00
Petr Hosek
f3269a94e7 [BOLT][CMake] Redo the build and install targets
The existing BOLT install targets are broken on Windows becase they
don't properly handle the output extension. We cannot use the existing
LLVM macros since those make assumptions that don't hold for BOLT. This
change instead implements custom macros following the approach used by
Clang and LLD.

Differential Revision: https://reviews.llvm.org/D151595
2023-06-01 14:48:01 +00:00
Petr Hosek
1d6a2c5357 Revert "[BOLT][CMake] Redo the build and install targets"
This reverts commit f99a7d3e38095cfdaf7e729289a8894dd31c7efa since it
broke the bolt-aarch64-ubuntu-clang-shared bot.
2023-06-01 08:03:50 +00:00
Petr Hosek
f99a7d3e38 [BOLT][CMake] Redo the build and install targets
The existing BOLT install targets are broken on Windows becase they
don't properly handle the output extension. We cannot use the existing
LLVM macros since those make assumptions that don't hold for BOLT. This
change instead implements custom macros following the approach used by
Clang and LLD.

Differential Revision: https://reviews.llvm.org/D151595
2023-06-01 06:01:39 +00:00
Petr Hosek
99a1aeefb3 Revert "[BOLT][CMake] Use LLVM macros for install targets"
This reverts commit 627d5e16127bd8034b893e66ab0c86eacf2d939a.
2023-05-30 19:28:14 +00:00
Petr Hosek
627d5e1612 [BOLT][CMake] Use LLVM macros for install targets
The existing BOLT install targets are broken on Windows becase they
don't properly handle output extension. Rather than reimplementing
this logic in BOLT, reuse the existing LLVM macros which already
handle this aspect correctly.

Differential Revision: https://reviews.llvm.org/D151595
2023-05-30 19:23:11 +00:00
Yi Kong
67cf01bd37 Reland^2 "[BOLT] Parallelize legacy profile merging"
Resovled the issue that when number of tasks is fewer than cores, we end
up creating as many threads as the number of cores, making the
performance worse than the single thread version.
2023-05-22 13:37:41 -07:00
Yi Kong
65404e51bf Revert "Reland "[BOLT] Parallelize legacy profile merging""
This reverts commit 611fb179b19857ffb87df81c926902fc7e3412ab.

Broken tests
2023-05-18 16:26:43 -07:00
Yi Kong
611fb179b1 Reland "[BOLT] Parallelize legacy profile merging"
This reverts commit 78d8d016490909ac759c6f76c5f8679bc7a58b2e.
2023-05-18 16:06:46 -07:00
Yi Kong
78d8d01649 Revert "[BOLT] Parallelize legacy profile merging"
This reverts commit 35af20d9e036deeed250b73fd3ae86d6455173c5.

The patch caused a test failure.
2023-04-28 21:24:52 +09:00
Yi Kong
35af20d9e0 [BOLT] Parallelize legacy profile merging
Merging profiles is quite expensive, but easily paralleizable.

8359 profiles on n2d-standard-128:
single-thread: 808s
multi-thread: 200s (~75% speed up)

Differential Revision: https://reviews.llvm.org/D149014
2023-04-27 15:37:14 +09:00
Yi Kong
d788db3d19 [BOLT][NFC] Simplify code using std::optional
Use std::optional instead of tracking if it is the first profile seen.

Differential Revision: https://reviews.llvm.org/D147308
2023-04-01 13:47:36 +08:00
Amir Ayupov
16492a6143 [BOLT][NFC] Rename {MachO,}RewriteInstance::create methods
Follow the code style of fallible constructors in [LLVM Programmer's Manual]
(https://llvm.org/docs/ProgrammersManual.html#fallible-constructors)
and rename `RewriteInstance::createRewriteInstance` to `RewriteInstance::create`

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D143119
2023-02-02 12:30:45 -08:00
Amir Ayupov
72e5b14fe7 [BOLT][NFC] Use llvm::make_second_range
Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D143019
2023-02-02 12:02:31 -08:00
serge-sans-paille
984b800a03
Move from llvm::makeArrayRef to ArrayRef deduction guides - last part
This is a follow-up to https://reviews.llvm.org/D140896, split into
several parts as it touches a lot of files.

Differential Revision: https://reviews.llvm.org/D141298
2023-01-10 11:47:43 +01:00
Amir Ayupov
be08bb7755 [BOLT][CMake] Add merge-fdata to bolt component
Build and install `merge-fdata` tool as part of `bolt` component:
```
$ ninja bolt
# builds llvm-bolt, perf2bolt and merge-fdata

$ cmake --install . --component bolt --prefix $HOME/test-install-bolt
-- Install configuration: "Release"
-- Install configuration: "Release"
-- Installing: /home/aaupov/test-install-bolt/lib/libbolt_rt_instr.a
-- Installing: /home/aaupov/test-install-bolt/lib/libbolt_rt_hugify.a
-- Installing: /home/aaupov/test-install-bolt/lib/libbolt_rt_instr_osx.a
-- Installing: /home/aaupov/test-install-bolt/bin/llvm-bolt
-- Installing: /home/aaupov/test-install-bolt/bin/perf2bolt
-- Installing: /home/aaupov/test-install-bolt/bin/llvm-boltdiff
-- Installing: /home/aaupov/test-install-bolt/bin/merge-fdata
```

Fixes #57249.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D139972
2023-01-03 17:40:36 -08:00
serge-sans-paille
61cff9079c [BOLT] Support building bolt when LLVM_LINK_LLVM_DYLIB is ON
This does *not* link with libLLVM, but with static archives instead. Not
super-great, but at least the build works, which is probably better than
failing.

Related to #57551

Differential Revision: https://reviews.llvm.org/D134434
2022-09-23 07:59:30 +02:00
serge-sans-paille
9029ed2e4b [BOLT] Fix (part of) dylib compatibility
Non-LLVM components should not be listed as part of LLVM_LINK_COMPONENTS.

Differential Revision: https://reviews.llvm.org/D134278
2022-09-22 10:41:40 +02:00
serge-sans-paille
3ca61941c1 Revert "[bolt] Fix (part of) dylib compatibility"
This reverts commit 34ad83d883cc4505412a7c3e1e3da74e5408aa82.
2022-09-22 10:41:21 +02:00
serge-sans-paille
34ad83d883 [bolt] Fix (part of) dylib compatibility
Non-LLVM component should not be listed as part of LLVM_LINK_COMPONENTS

Differential Revision: https://reviews.llvm.org/D134278
2022-09-22 10:32:40 +02:00
Nicolai Hähnle
f7872cdce1 CommandLine: add and use cl::SubCommand::get{All,TopLevel}
Prefer using these accessors to access the special sub-commands
corresponding to the top-level (no subcommand) and all sub-commands.

This is a preparatory step towards removing the use of ManagedStatic:
with a subsequent change, these global instances will be moved to
be regular function-scope statics.

It is split up to give downstream projects a (albeit short) window in
which they can switch to using the accessors in a forward-compatible
way.

Differential Revision: https://reviews.llvm.org/D129118
2022-08-02 23:49:16 +02:00
Rafael Auler
fc0ced73dc Add BAT testing framework
This patch refactors BAT to be testable as a library, so we
can have open-source tests on it. This further fixes an issue with
basic blocks that lack a valid input offset, making BAT omit those
when writing translation tables.

Test Plan: new testcases added, new testing tool added (llvm-bat-dump)

Differential Revision: https://reviews.llvm.org/D129382
2022-07-29 14:55:04 -07:00
John Ericson
07b749800c [cmake] Don't export LLVM_TOOLS_INSTALL_DIR anymore
First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.

Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.

@beanz added it in d0e1c2a550ef348aae036d0fe78cab6f038c420c to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.

Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.

To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.

For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D117977
2022-07-21 19:04:00 +00:00
Amir Ayupov
d2c8769936 [BOLT][NFC] Use range-based STL wrappers
Replace `std::` algorithms taking begin/end iterators with `llvm::` counterparts
accepting ranges.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128154
2022-06-23 22:16:27 -07:00
John Ericson
0bb317b7bf Revert "[cmake] Don't export LLVM_TOOLS_INSTALL_DIR anymore"
This reverts commit d5daa5c5b091cafb9b7ffd19b5dfa2daadef3229.
2022-06-10 19:26:12 +00:00
John Ericson
d5daa5c5b0 [cmake] Don't export LLVM_TOOLS_INSTALL_DIR anymore
First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.

Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.

@beanz added it in d0e1c2a550ef348aae036d0fe78cab6f038c420c to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.

Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.

To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.

For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D117977
2022-06-10 14:35:18 +00:00
Yi Kong
716d428ab5 [BOLT] Add -o option to merge-fdata
Differential Revision: https://reviews.llvm.org/D126788
2022-06-02 01:29:04 +08:00
Yi Kong
2a42f7f72a [BOLT] Allow merge-fdata to take a directory as input
and recursively merge all files under said directory. This is similar
to `llvm-profdata merge`.

Differential Revision: https://reviews.llvm.org/D126695
2022-06-01 03:01:14 +08:00
Yi Kong
97715104c5 [BOLT][NFC] Don't over-specify the size of SmallVector
This is the recommended way, should make merging profiles ever so
slightly faster.
2022-05-31 16:16:38 +08:00
Rafael Auler
2fdc5d336e [BOLT] Fix merge-fdata handling of BAT profiles
When a profile is collected in a BOLTed binary, the generated
profile is tagged with a header string "boltedcollection" in the first
line of the fdata file. Fix merge-fdata to recognize this header
string and preserve it into the output.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D125591
2022-05-13 19:41:55 -07:00
Amir Ayupov
bdba3d091c [BOLT][CMAKE] Fix DYLIB build
Move BOLT libraries out of `LLVM_LINK_COMPONENTS` to `target_link_libraries`.
Addresses issue #55432.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D125568
2022-05-13 13:27:21 -07:00