If a jump table has entries at the end that are a result of
__builtin_unreachable() targets, BOLT can confuse them with function
pointers. In such case, we should exclude these targets from the table
as we risk incorrectly updating the function pointers. It is safe to
exclude them as branching on such targets is considered an undefined
behavior.
Call site information setting was conditioned on branch information
presence for a given block. However, it's possible to have sampled
profile lacking one or the other for a given basic block.
Iterate over branch profiles and call profiles independently to cover
all recorded profile data.
Depends on https://github.com/llvm/llvm-project/pull/87569
Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s
Reviewers: ayermolo, dcci, maksfb, rafaelauler
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/87743
Move BAT parent function lookup outside `getLocationName`, to the
scope where we retrieve `FuncBranchData` linked with the function.
Previously DataAggregator would store branch profile recorded in the
split fragment in `FuncBranchData` associated with the fragment, and
perform name translation in `getLocationName` for symbol name only.
This works for fdata profile which is printed out as-is, but doesn't
work with BAT YAML profile writer which requires a combined profile.
The issue necessitated `fixupBATProfile` which partially addressed the
issue (reassigned inter-fragment calls back into intra-function
branches). However, `fixupBATProfile` fails to address disjoint
profiles (i.e. doesn't merge `FuncBranchData` for fragments back
into parent). This diff eliminates the need for `fixupBATProfile` by
removing the root cause of the issue.
Test Plan: NFC for existing tests
Reviewers: ayermolo, dcci, rafaelauler, maksfb
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/87569
BAT writeMaps encoded the assumption that functions are only split into
two fragments (hot and cold). However, BOLT supports splitting into
arbitrary number of fragments. Relax that assumption and look up primary
(hot) fragment explicitly.
Depends on: https://github.com/llvm/llvm-project/pull/86219
Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s
Reviewers: ayermolo, rafaelauler, maksfb, dcci
Reviewed By: maksfb, dcci
Pull Request: https://github.com/llvm/llvm-project/pull/87123
Provide a mechanism to resolve call target information for calls from non-BAT
functions to BAT functions (`YAMLProfileWriter::convert`). Make it generic for
future use in BAT-to-BAT calls.
Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: ayermolo, maksfb, rafaelauler, dcci
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/86219
Under normal circumstances, we terminate basic blocks on a trap
instruction. However, Linux kernel may resume execution after hitting a
trap (ud2 on x86). Thus, we introduce "--terminal-trap" option that will
specify if the trap instruction should terminate the control flow. The
option is on by default except for the Linux kernel mode when it's off.
Update instruction locations in the __bug_table section after new code
is emitted. If an instruction with associated bug ID was deleted,
overwrite its location with zero.
Indirect call handling missed setting an `EntryDiscriminator` while it's
set for direct calls and tail calls.
Improve YAML profile accuracy by unifying the destination setting
between direct and indirect calls into `setCSIDestination` method.
Depends on: https://github.com/llvm/llvm-project/pull/86848
Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s
Reviewers: ayermolo, maksfb, rafaelauler
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/82128
Make them start with 1 instead of 0 (reserved for primary entry point).
Test Plan:
```
bin/llvm-lit -a tools/bolt/test/X86/yaml-secondary-entry-discriminator.s
```
Reviewers: rafaelauler, ayermolo, maksfb, dcci
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/86848
Attach call counters to YAML profile, covering inter-function control
flow.
Depends on: https://github.com/llvm/llvm-project/pull/86218
Test Plan:
Updated bolt/test/X86/bolt-address-translation-yaml.test
Provide secondary entry points for `EntryDiscriminator` call info field
in YAML profile.
Increases BAT section size to:
- large binary: 39655300 bytes (1.03x the original),
- medium binary: 3834328 bytes (0.65x),
- small binary: 924 bytes (0.64x).
Depends on: https://github.com/llvm/llvm-project/pull/76911
Test Plan:
- Updated bolt-address-translation{,-yaml}.test
- Added openssl test: https://github.com/rafaelauler/bolt-tests/pull/30
Reviewers: dcci, rafaelauler, maksfb, ayermolo
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/86218
Hide the implementations of `FuncHashes` and `BBHashMap` classes,
getting rid of `at` accessors that could throw an exception.
Test Plan: NFC
Reviewers: ayermolo, maksfb, dcci, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/86353
The DW_AT_abstract_origin can be a cross-cu reference as a by-product of
LTO. On IR level for absolute references an address is stored, vs a DIE
for relative references. Added a map to keep track of cross-cu
referenced DIEs to use when we add an Entry.
For DWARF5 BOLT was not retreiving address and instead was setting an
index.
Changed so that an address is used, and added DWARF4 test because it was
missing.
YAML profile reader checks the number of basic blocks in regular,
no-stale-matching mode. Add it to BAT.
This increases the size of BAT section to:
- large binary: 39583080 bytes (1.02x of the original),
- medium binary: 3816492 bytes (0.64x),
- small binary: 920 bytes (0.64x, no change due to alignment).
Test Plan: Updated bolt-address-translation-yaml.test
Reviewers: rafaelauler, ayermolo, maksfb, dcci
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/86045
Add input basic block index to BAT metadata. This addresses the case
where some basic blocks are eliminated, and output index is not equal
to the input block index. These indices are used in non-stale-matching
mode.
Increases BAT section size to:
- large binary: 39521512 bytes (1.02x original),
- medium binary: 3799988 bytes (0.64x),
- small binary: 920 bytes (0.64x).
Test Plan:
Updated bolt-address-translation{,-yaml}.test
Pull Request: https://github.com/llvm/llvm-project/pull/86044
Relax assumptions that YAML output is not supported in BAT mode.
Set up basic infrastructure for emitting YAML for functions not covered
by BAT, such as from `.bolt.org.text` section (code identical to input binary
sans external refs), or non-rewritten functions in non-relocation mode (where
the function stays in the same section but BAT mapping is not emitted).
This diff only produces YAML profile for non-BAT functions (skipped,
non-simple). YAML profile for BAT functions is added in follow-up diffs:
- https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with
internal control flow information only (branch profile),
- https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile
(calls profile).
Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: ayermolo, dcci, maksfb, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/76910
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.
Use `getAnnotationWithDefault` instead of testing if the annotation is
set. If the default value is used, and `CSI.Count` is set to zero, the
target is discarded by a check below.
Test Plan: NFC
Reviewers: maksfb, dcci, rafaelauler, ayermolo
Reviewed By: ayermolo
Pull Request: https://github.com/llvm/llvm-project/pull/82129
The function is to be used by YAML profile emission in BAT mode for
BinaryFunctions not covered by BAT tables (same as in original binary).
Test Plan: NFC
Reviewers: rafaelauler, ayermolo, dcci, maksfb
Reviewed By: dcci
Pull Request: https://github.com/llvm/llvm-project/pull/76909
commit 43a2ec483fe08064b53a6293682e9bab97df61a0
Author: Jonas Devlieghere <jonas@devlieghere.com>
Date: Tue Mar 19 08:30:47 2024 -0700
removed parameter Translator from the constructor of DwarfStreamer.
This patch fixes the build by updating the constructor of DIEStreamer
accordingly.
According to the DWARF spec a DIE that has DW_AT_specification or
DW_AT_abstract_origin can be part of .debug_name if a DIE those
attribute points to has DW_AT_name or DW_AT_linkage_name.
Refactor MCPlusBuilder's create{Instruction}() functions that used to
return bool. We almost never check the return value as we rely on
llvm_unreachable() to detect unimplemented functionality. There were a
couple of cases that checked the return value, but they would hit the
unreachable condition first (at least in debug builds) before the return
value gets checked.
Reset operand list whenever we create a new instruction via a parameter
passed by reference. Most functions were already doing this, but there
are several places missing the reset. Potentially, if we don not clear
the list it could lead to invalid instruction operands. But the existing
code is unaffected.
.pci_fixup section contains a table with entries allowing to invoke a
fixup hook whenever a problem is encountered with a PCI device. The
hookup code typically points to the start of a function. As we are not
relocating functions in the kernel (at least not yet), verify this
assumption while reading the table and ignore any functions with a fixup
code in the middle.
The foreign TU list immediately follows the local TU list and they both
use the same index, so that if there are N local TU entries, the index
for the first foreign TU is N.
Changed so that the size of local TU is accounted for when setting
foreign TU index.
Read .altinstructions and annotate instructions that have alternative
sequences with "AltInst" annotation. Note that some instructions may
have more than one alternatives, in which case they will have multiple
annotations in the form "AltInst", "AltInst2", "AltInst3", etc.