If `LLVM_APPEND_VC_REV` is on, add the git revision to the `.file`
string. The revision can be set with `LLVM_FORCE_VC_REVISION`.
Before:
`.file "git_revision.cpp",,"LLVM version 19.0.0git"`
After:
`.file "git_revision.cpp",,"LLVM version 19.0.0git (LLVM_REVISION)"`
-fsanitize=function emits a signature and function hash before a
function. Similar to 7f6e2c9, these can be sheared off when
`.subsections_via_symbols` is used.
This change uses the same technique 7f6e2c9 introduced for prefixes:
emitting a symbol for the metadata, then marking the actual function
entry as an .alt_entry symbol.
`clang -c -masm=intel` compiling a source file with file scope basic asm
incorrectly uses the AT&T dialect.
```
% cat a.c
asm("mov rax, rax");
% clang a.c -c -masm=intel
<inline asm>:1:1: error: unknown use of instruction mnemonic without a size suffix
mov rax, rax
^
```
Fix this by setting the assembler dialect from the MCAsmInfo object.
Note: `clang -c -flto -masm=intel a.c` still fails because of
https://reviews.llvm.org/D82862 for #34830: it tried to support AT&T
syntax for clang-cl, but the forced AT&T syntax is not compatible with
intended Intel syntax.
Pull Request: https://github.com/llvm/llvm-project/pull/85367
This is a small part of #70452, attempting to take a small simpler part
of it in isolation to simplify what remains. It changes the getSpillSize,
getFoldedSpillSize, getRestoreSize and getFoldedRestoreSize methods to return
optional<LocationSize> instead of unsigned. The code is intended to be the
same, keeping the optional<> to specify when there was no size found, with some
minor adjustments to make sure that unknown (~UINT64_C(0)) sizes are handled
sensibly. Hopefully as more unsigned's are converted to LocationSize's the use
of ~UINT64_C(0) can be cleaned up too.
The current implementation of aliases tries to remove all the aliases in
the module to prevent the generic version of `AsmPrinter` from emitting
them incorrectly. Unfortunately, if the aliases are used this will fail.
Instead let's override the function to print aliases directly.
In addition, the declarations of the alias functions must occur before
the uses. To fix this we emit alias declarations as part of
`emitDeclarations` and only emit the `.alias` directives at the end
(where we can assume the aliasee has also already been declared).
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.
First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.
| | |
|--|--|
| Address of the function | Function Address |
| Number of basic blocks in this function | NumBlocks |
| BB entry 1
| BB entry 2
| ...
| BB entry #NumBlocks
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
| Number of sections for this function | NumBBRanges |
| Section 1 begin address | BaseAddress[1] |
| Number of basic blocks in section 1 | NumBlocks[1] |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.
We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
- New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF
sections has landed, there is no longer a need to keep around the
mbb-profile-dump flag.
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.
Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.
The other changes are supporting changes, to handle the new constructs
generated by that pass.
There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.
There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.
I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.
Thanks to @dpaoliello for extensive testing and suggestions.
(Originally posted as https://reviews.llvm.org/D157547 .)
Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF
section. Implements filecheck tests to verify emitting new fields.
This patch emits optional PGO related analyses into the bb-addr-map ELF
section during AsmPrinter. This currently supports Function Entry Count,
Machine Block Frequencies. and Machine Branch Probabilities. Each is
independently enabled via the `feature` byte of `bb-addr-map` for the given
function.
A part of [RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis](https://discourse.llvm.org/t/rfc-pgo-accuracy-metrics-emitting-and-evaluating-branch-and-block-analysis/73902).
... by lowering them as lazy resolve-on-first-use symbol resolvers. Note that this is subtly different timing than on ELF platforms, where ifunc resolution happens at load time.
Since ld64 and ld-prime don't support all the cases we need for these, we lower them manually in the AsmPrinter.
This works around an AIX assembler and linker bug. If the
-fno-integrated-as and -frecord-command-line options are used but
there's no actual code in the source file, the assembler creates an
object file with only an .info section. The AIX linker rejects such an
object file.
When generating the assembly code for AIX/XCOFF, the .file pseudo-op
needs to be emitted first, before any csects are generated. Otherwise,
information such as the embedded command line will be associated with
part of the object file rather than the entire object file.
28b9126879
introduced the path cloning format in the basic-block-sections profile.
This PR validates and applies path clonings.
A path cloning is valid if all of these conditions hold:
1. All bb ids in the path are mapped to existing blocks.
2. Each two consecutive bb ids in the path have a successor relationship
in the CFG.
3. The path does not include a block with indirect branches, except
possibly as the last block.
Applying a path cloning involves cloning all blocks in the path (except
the first one) and setting up their branches.
Once all clonings are applied, the cluster information is used to guide
block layout in the modified function.
The goal in #66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test.
We also increase the precision of `MachineBlockFrequencyInfo::getBlockFreqRelativeToEntryBlock` to double. Existing code uses it as float so should be unaffected.
We were losing the function entry count, which is useful to check profile quality. For the original cases where we want
entrypoint-relative MBB frequencies, the user would just need to divide these values by the entrypoint (first MBB, with ID=0) value.
With the large code model, the label difference may not fit into 32 bits.
Even if we assume that any individual function is no larger than 2^32
and use a difference from the function entry to the target destination,
things like BOLT can rearrange blocks (even if BOLT doesn't necessarily
work with the large code model right now).
set directives avoid static relocations in some 32-bit entry cases, but
don't worry about set directives for 64-bit jump table entries (we can
do that later if somebody really cares about it).
check-llvm in a bootstrapped clang with the large code model passes.
Fixes#62894
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D159297
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:
* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).
Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.
Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)
This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149367
This reverts commit 8d0c3db388143f4e058b5f513a70fd5d089d51c3.
Causes crashes, see comments in https://reviews.llvm.org/D149367.
Some follow-up fixes are also reverted:
This reverts commit 636269f4fca44693bfd787b0a37bb0328ffcc085.
This reverts commit 5966079cf4d4de0285004eef051784d0d9f7a3a6.
This reverts commit e7294dbc85d24a08c716d9babbe7f68390cf219b.
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:
* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).
Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.
Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)
This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149367
1. In X86LowerAMXType.cpp dyn_cast could lead to UserI be nullptr which coud be dref in IRBuilder constructor.
2. In AsmPrinter.cpp, doInitialization could make MMI be nullptr if MMIWP->getMMI() is false, then the deref after could be unexpected.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D157948
The ELFObjectWriter::shouldRelocateWithSymbol change in D128958 is untested. Add
the testing.
Also, change a diagnostic to follow the convention (no capitalization or
trailing period). Test it.
This patch extends support of the option `-frecord-command-line` to XCOFF. XCOFF doesn’t have custom sections like ELF, so the command line data is emitted to a .info section instead. A C_INFO symbol is generated with the .info section to preserve the command line data past the link step. Multiple command lines are separated by newlines and null bytes. The command line data can be retrieved on AIX with command `what file_name`.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D153600
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
The `__DATA,xray_instr_map` section has label differences like
`.quad Lxray_sled_0-Ltmp0` that is represented as a pair of UNSIGNED and SUBTRACTOR relocations.
LLVM integrated assembler attempts to rewrite A-B into A-B'+offset where B' can
be included in the symbol table. B' is called an atom and should be a
non-temporary symbol in the same section. However, since `xray_instr_map` does
not define a non-temporary symbol, the SUBTRACTOR relocation will have no
associated symbol, and its `r_extern` value will be 0. Therefore, we will see
linker errors like:
error: SUBTRACTOR relocation must be extern at offset 0 of __DATA,xray_instr_map in a.o
To fix this issue, we need to define a non-temporary symbol in the section. We
can accomplish this by renaming `Lxray_sleds_start0` to `lxray_sleds_start0`
("L" to "l").
`lxray_sleds_start0` serves as the atom for this dead-strippable subsection.
With the `S_ATTR_LIVE_SUPPORT` attribute, `ld -dead_strip` will retain
subsections that reference live functions.
Special thanks to Oleksii Lozovskyi for reporting the issue and providing
initial analysis.
Differential Revision: https://reviews.llvm.org/D153239
As mentioned by commit c5d38924dc6688c15b3fa133abeb3626e8f0767c (Apr 2020),
PC-relative entries avoid dynamic relocations and can therefore make the
section read-only.
This is similar to D78082 and D78590. We cannot commit to support
compiler/runtime built at different versions, so just don't play with versions.
For Mach-O support (incomplete yet), we use non-temporary `lxray_fn_idx[0-9]+`
symbols. Label differences are represented as a pair of UNSIGNED and SUBTRACTOR
relocations. The SUBTRACTOR external relocation requires r_extern==1 (needs to
reference a symbol table entry) which can be satisfied by `lxray_fn_idx[0-9]+`.
A `lxray_fn_idx[0-9]+` symbol also serves as the atom for this dead-strippable
section (follow-up to commit b9a134aa629de23a1dcf4be32e946e4e308fc64d).
Differential Revision: https://reviews.llvm.org/D152661
Add the `S_ATTR_LIVE_SUPPORT` attribute to the sections so that `ld -dead_strip`
will retain subsections that reference live functions, once we we add linker
private "l" symbols as atoms.
Apply my post-commit comment on D81995. The negative name misguided commit
d8a8e5d6240a1db809cd95106910358e69bbf299 (`[clang][cli] Remove marshalling from
Opt{In,Out}FFlag`) to:
* accidentally flip the option to not emit the xray_fn_idx section.
* change -fno-xray-function-index (instead of -fxray-function-index) to emit xray_fn_idx
This patch renames XRayOmitFunctionIndex and makes -fxray-function-index emit
xray_fn_idx, but the default remains -fno-xray-function-index .
Consider only targets where `MCAsmInfo::ExceptionsType == ExceptionHandling::None`
and that support CFI (when `MCAsmInfo::UsesCFIForDebug` is set to true):
currently, only AMDGPU.
This patch enables the emission of CFI information in the .eh_frame
section when the uwtable attribute is present on a function.
Before, we could generate CFI information for debugging puproses only.
This patch prepares AMDGPU to support collecting GPU stack traces in the future.
I did a first implementation (https://reviews.llvm.org/D139024)
but at the time I had not realized that no other platform used
`UsesCFIForDebug`.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D151806
This information helps to avoid considering cloning for blocks with indirect branches.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D150611
Currently we use RTTI objects to check type compatibility. To support non-unique
RTTI objects, commit 5745eccef54ddd3caca278d1d292a88b2281528b added a
`checkTypeInfoEquality` string matching to the runtime.
The scheme is inefficient.
```
_Z1fv:
.long 846595819 # jmp
.long .L__llvm_rtti_proxy-_Z3funv
...
main:
...
# Load the second word (pointer to the RTTI object) and dereference it.
movslq 4(%rsi), %rax
movq (%rax,%rsi), %rdx
# Is it the desired typeinfo object?
leaq _ZTIFvvE(%rip), %rax
# If not, call __ubsan_handle_function_type_mismatch_v1, which may recover if checkTypeInfoEquality allows
cmpq %rax, %rdx
jne .LBB1_2
...
.section .data.rel.ro,"aw",@progbits
.p2align 3, 0x0
.L__llvm_rtti_proxy:
.quad _ZTIFvvE
```
Let's replace the indirect `_ZTI` pointer with a type hash similar to
`-fsanitize=kcfi`.
```
_Z1fv:
.long 3238382334
.long 2772461324 # type hash
main:
...
# Load the second word (callee type hash) and check whether it is expected
cmpl $-1522505972, -4(%rax)
# If not, fail: call __ubsan_handle_function_type_mismatch
jne .LBB2_2
```
The RTTI object derives its name from `clang::MangleContext::mangleCXXRTTI`,
which uses `mangleType`. `mangleTypeName` uses `mangleType` as well. So the
type compatibility change is high-fidelity.
Since we no longer need RTTI pointers in
`__ubsan::__ubsan_handle_function_type_mismatch_v1`, let's switch it back to
version 0, the original signature before
e215996a2932ed7c472f4e94dc4345b30fd0c373 (2019).
`__ubsan::__ubsan_handle_function_type_mismatch_abort` is not
recoverable, so we can revert some changes from
e215996a2932ed7c472f4e94dc4345b30fd0c373.
Reviewed By: samitolvanen
Differential Revision: https://reviews.llvm.org/D148785
The current implementation of -fsanitize=function places two words (the prolog
signature and the RTTI proxy) at the function entry, which makes the feature
incompatible with Intel Indirect Branch Tracking (IBT) that needs an ENDBR instruction
at the function entry. To allow the combination, move the two words before the
function entry, similar to -fsanitize=kcfi.
Armv8.5 Branch Target Identification (BTI) has a similar requirement.
Note: for IBT and BTI, whether a function gets a marker instruction at the entry
generally cannot be assumed (it can be disabled by a function attribute or
stronger LTO optimizations).
It is extremely unlikely for two words preceding a function entry to be
inaccessible. One way to achieve this is by ensuring that a function is
aligned at a page boundary and making the preceding page unmapped or
unreadable. This is not reasonable for application or library code.
(Think: the first text section has crt* code not instrumented by
-fsanitize=function.)
We use 0xc105cafe for all targets. .long 0xc105cafe disassembles to invalid
instructions on all architectures I have tested, except Power where it is
`lfs 8, -13570(5)` (Load Floating-Point with a weird offset, unlikely to be used in real code).
---
For the removed function in AsmPrinter.cpp, remove an assert: `mdconst::extract`
already asserts non-nullness.
For compiler-rt/test/ubsan/TestCases/TypeCheck/Function/function.cpp,
when the function doesn't have prolog/epilog (-O1 and above), after moving the two words,
the address of the function equals the address of ret instruction,
so symbolizing the function will additionally get a non-zero column number.
Adjust the test to allow an optional column number.
```
.long 3238382334
.long .L__llvm_rtti_proxy-_Z1fv
_Z1fv: // symbolizing here retrieves the line table entry from the second .loc
.file 0 ...
.loc 0 1 0
.cfi_startproc
.loc 0 2 1 prologue_end
retq
```
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D148665
This patch encapsulates the encoding and decoding logic of basic block metadata into the Metadata struct, and also reduces the decoded size of `SHT_LLVM_BB_ADDR_MAP` section.
The patch would've looked more readable if we could use designated initializer, but that is a c++20 feature.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D148360
This change will allow to put code pointers in DWARF info fields that are larger than actual pointer size, e.g. 16-bit pointers into 32-bit fields.
The need for this came up while creating support for MSP430 in LLDB. MSP430-GCC already generates DWARF info with 32-bit fields, so this change is necessary for LLDB to maintain compatibility with both GCC and LLVM binaries. Moreover, right now in LLDB there is no support for having DWARF pointer size different from ELF header type, e.g. 16-bit DWARF info within ELF32, and it seems there is no such thing as ELF16.
Since other mainline targets are made to have the same pointer size in both MCAsmInfo and DataLayout, there is no need to change anything there.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D148042
The placement is currently wrong in the presence of function entry related
instrumentations (prefixdata, -fpatchable-function-entry=, -fsanitize=kcfi,
etc).
Currenty, setting the -mbb-profile-dump dumps a CSV file with blocks
inside an individual function identified by their MBB numbers. This
patch changes the MBBs to be identified by their ID which is set at MBB
creation and not changed afterwards, making it inherently stable
throughout the backend. This alleviates concerns with the MBB IDs
changing between the profile dump and what ends up in the final object
file. The MBBs inside the SHT_LLVM_BB_ADDR_MAP sections are also
identified using their MBB ID rather than number, so if we want to match
them up we need to identify the MBBs here by number.
Reviewed By: mtrofin, rahmanl
Differential Revision: https://reviews.llvm.org/D147366
For Big Endian, the function `emitGlobalConstantLargeInt` tries to right shift `Realigned` by an amount `ExtraBitSize` in place. However, if the constant to emit has a bit width less than 64 and the bit width is not a multiple of 8, the shift amount will be greater than the bit width of `Realigned`, which causes assertion error described in issue [[ https://github.com/llvm/llvm-project/issues/59055 | issue #59055 ]].
This patch fixes the issue by avoiding right shift when bit width is under 64 to avoid the assertion error.
Reviewed By: Peter
Differential Revision: https://reviews.llvm.org/D138246
This reverts commit db6a979ae82410e42430e47afa488936ba8e3025.
Reland D102817 without any change. The previous revert was a mistake.
Differential Revision: https://reviews.llvm.org/D102817
This patch adds a basic block profile dump option within the AsmPrinter
and dumps basic block profile information so that cost models can use
the data for downstream analysis.
Differential Revision: https://reviews.llvm.org/D143311