Depends on #120010
Support `R_AARCH64_AUTH_TLSDESC_ADR_PAGE21`, `R_AARCH64_AUTH_TLSDESC_LD64_LO12`
and `R_AARCH64_AUTH_TLSDESC_LD64_LO12` static relocations and
`R_AARCH64_AUTH_TLSDESC` dynamic relocation. IE/LE optimization is not
currently supported for AUTH TLSDESC.
Depends on #114525
Support `R_AARCH64_AUTH_GOT_ADR_PREL_LO21` and `R_AARCH64_AUTH_GOT_LD_PREL19`
GOT-generating relocations. A corresponding `RE_AARCH64_AUTH_GOT_PC` member
of `RelExpr` is added, which is an AUTH-specific variant of `R_GOT_PC`.
Depends on #113811
Support `R_AARCH64_AUTH_ADR_GOT_PAGE`, `R_AARCH64_AUTH_GOT_LO12_NC` and
`R_AARCH64_AUTH_GOT_ADD_LO12_NC` GOT-generating relocations. For preemptible
symbols, dynamic relocation `R_AARCH64_AUTH_GLOB_DAT` is emitted. Otherwise,
we unconditionally emit `R_AARCH64_AUTH_RELATIVE` dynamic relocation since
pointers in signed GOT needs to be signed during dynamic link time.
RelExpr enumerators are named `R_*`, which can be confused with ELF
relocation type names. Rename the target-specific ones to `RE_*` to
avoid confusion.
For consistency, the target-independent ones can be renamed as well, but
that's not urgent. The relocation processing mechanism with RelExpr has
non-trivial overhead compared with mold's approach, and we might make
more code into Arch/*.cpp files and decrease the enumerators.
Pull Request: https://github.com/llvm/llvm-project/pull/118424
Assume PAC instructions being supported with PAuth core info different from (0,0). The (0,0) value means that an ELF file is incompatible with PAuth - see https://github.com/ARM-software/abi-aa/blob/2024Q3/pauthabielf64/pauthabielf64.rst#core-information. With PAC non-hint instructions supported, `autia1716; br x17` can be replaced with `braa x17, x16; nop`, where `braa` is an authenticated branch instruction using IA key, discriminator from x16 and signed target address from x17.
so that we can remove the global `ctx` from toString implementations.
Rename lld::toString (to lld:🧝:toStr) to simplify name lookup (we
have many llvm::toString and another lld::toString(const llvm::opt::Arg
&)).
The PAC PLT header contains adrp instruction which immediate should be
filled. In https://reviews.llvm.org/D62609, the adrp instruction address
was calculated incorrectly. This patch resolves the issue.
The test is already present in test/ELF/aarch64-feature-pac.s
also rename `TargetInfo *getXXXTargetInfo` to `void setXXXTargetInfo`
and change it to set `ctx.target`. This ensures that when `ctx` becomes
a local variable, two lld invocations will not reuse the function-local
static variable.
---
Reland after commit c35214c131c0bc7f54dc18ceb75c75cba89f58ee
([ELF] Initialize TargetInfo members).
also rename `TargetInfo *getXXXTargetInfo` to `void setXXXTargetInfo`
and change it to set `ctx.target`. This ensures that when `ctx` becomes
a local variable, two lld invocations will not reuse the function-local
static variable.
When Branch Target Identification BTI is enabled all indirect branches
must target a BTI instruction. A long branch thunk is a source of
indirect branches. To date LLD has been assuming that the object
producer is responsible for putting a BTI instruction at all places the
linker might generate an indirect branch to. This is true for clang, but
not for GCC. GCC will elide the BTI instruction when it can prove that
there are no indirect branches from outside the translation unit(s). GNU
ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for
the destination when a long range stub was needed [1].
This means that using GCC compiled objects with LLD may lead to LLD
generating an indirect branch to a location without a BTI. The ABI [2]
has also been clarified to say that it is a static linker's
responsibility to generate a landing pad when the target does not have a
BTI.
This patch implements the same mechansim as GNU ld. When the output ELF
file is setting the
GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the
destination to see if it has a BTI instruction. If it does not we
generate a landing pad consisting of:
BTI c
B <destination>
The B <destination> can be elided if the thunk can be placed so that
control flow drops through. For example:
BTI c
<destination>:
This will be common when -ffunction-sections is used.
The landing pad thunks are effectively alternative entry points for the
function. Direct branches are unaffected but any linker generated
indirect branch needs to use the alternative. We place these as close as
possible to the destination section.
There is some further optimization possible. Consider the case:
.text
fn1
...
fn2
...
If we need landing pad thunks for both fn1 and fn2 we could order them
so that the thunk for fn1 immediately precedes fn1. This could save a
single branch. However I didn't think that would be worth the additional
complexity.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
[2] https://github.com/ARM-software/abi-aa/issues/196
Ctx was introduced in March 2022 as a more suitable place for such
singletons.
llvm/Support/thread.h includes <thread>, which transitively includes
sstream in libc++ and uses ios_base::in, so we cannot use `#define in ctx.sec`.
`symtab, config, ctx` are now the only variables using
LLVM_LIBRARY_VISIBILITY.
In AArch64, the endianness of instruction encodings is always little,
whereas the endianness of data swaps between LE and BE modes. So
getImplicitAddend must use the right one of read32() and read32le(), for
data and code respectively. It was using read32() throughout, causing
instructions to be read as big-endian in BE mode, getting the wrong
addend.
Fixed, and updated the existing test to check both endiannesses. The
expected results for data must be byte-swapped, but the ones for code
need no adjustment.
Normally, AArch64 ELF objects use the SHT_RELA type of relocation
section, with addends stored in each relocation. But some legacy AArch64
object producers still use SHT_REL in some situations, storing the
addend in the initial value of the data item or instruction immediate
field that the relocation will modify. LLD was mishandling relocations
of this type in multiple ways.
Firstly, many of the cases in the `getImplicitAddend` switch statement
were apparently based on a misunderstanding. The relocation types that
operate on instructions should be expecting to find an instruction of
the appropriate type, and should extract its immediate field. But many
of them were instead behaving as if they expected to find a raw 64-, 32-
or 16-bit value, and wanted to extract the right range of bits. For
example, the relocation for R_AARCH64_ADD_ABS_LO12_NC read a 16-bit word
and extracted its bottom 12 bits, presumably on the thinking that the
relocation writes the low 12 bits of the value it computes. But the
input addend for SHT_REL purposes occupies the immediate field of an
AArch64 ADD instruction, which meant it should have been reading a
32-bit AArch64 instruction encoding, and extracting bits 10-21 where the
immediate field lives. Worse, the R_AARCH64_MOVW_UABS_G2 relocation was
reading 64 bits from the input section, and since it's only relocating a
32-bit instruction, the second half of those bits would have been
completely unrelated!
Adding to that confusion, most of the values being read were first
sign-extended, and _then_ had a range of bits extracted, which doesn't
make much sense. They should have first extracted some bits from the
instruction encoding, and then sign-extended that 12-, 19-, or 21-bit
result (or whatever else) to a full 64-bit value.
Secondly, after the relocated value was computed, in most cases it was
being written into the target instruction field via a bitwise OR
operation. This meant that if the instruction field didn't initially
contain all zeroes, the wrong result would end up in it. That's not even
a 100% reliable strategy for SHT_RELA, which in some situations is used
for its repeatability (in the sense that applying the relocation twice
should cause the second answer to overwrite the first, so you can
relocate an image in advance to its most likely address, and then do it
again at load time if that turns out not to be available). But for
SHT_REL, when you expect nonzero immediate fields in normal use, it
couldn't possibly work. You could see the effect of this in the existing
test, which had a lot of FFFFFF in the expected output which there
wasn't any plausible justification for.
Finally, one relocation type was actually missing: there was no support
for R_AARCH64_ADR_PREL_LO21 at all.
So I've rewritten most of the cases in `getImplicitAddend`; replaced the
bitwise ORs with overwrites; and replaced the previous test with a much
more thorough one, obtained by writing an input assembly file with
explicitly specified relocations on instructions that also have
carefully selected immediate fields, and then doing some yaml2obj
seddery to turn the RELA relocation section into a REL one.
The function getImplicitAddend() is incomplete, as it is possible to
cook up object files with other relocation addends. Although using
llvm-mc or the clang integrated assembler does not produce such object
files, a proprietary assembler known as armasm can:
https://developer.arm.com/documentation/101754/0622/armasm-Legacy-Assembler-Reference
armasm is in a frozen state, but it is still actively used in a lot of
legacy codebases as the directives, macros and operators are very
different from the clang integrated assembler. This makes porting a lot
of legacy code from armasm syntax impractical for a lot of projects.
Some internal testing of projects using open-source clang and lld fell
over at link time when legacy armasm objects were included in the link.
The goal of this patch is to enable people with legacy armasm objects to
be able to use lld as the linker. Sadly armasm uses SHT_REL format
relocations for AArch64 rather than SHT_RELA, which causes lld to reject
the objects. As a frozen project we know the small number of relocations
that the assembler officially supports and won't include (outside the
equivalent of the .reloc directive which I think we can rule out of
scope as that is not commonly used).
The benefit to lld is that it will ease migration from a proprietary to
an open-source toolchain. The drawback is the implementation of a small
number of SHT_REL relocations. Although this patch doesn't aim to
comprehensively cover all possible relocation addends, it does extend
lld to work with the relocation addends that armasm produces, using the
canonical aaelf64 document as a reference:
https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst
Florian pointed out that we're accidentally eliding the Android note for
static executables, as it's guarded behind the "can have memtag globals"
conditional. Of course, memtag globals are unsupported for static
executables, but we should still allow static binaries to produce the
Android note (as that's the only way they get MTE).
When the .eh_frame section is placed at a non-zero
offset within its output section, the relocation
value within .eh_frame are computed incorrectly.
We had similar issue in .ARM.exidx section and it has been
fixed already in https://reviews.llvm.org/D148033.
While applying the relocation using S+A-P, the value
of P (the location of the relocation) is getting wrong.
P is:
P = SecAddr + rel.offset, But SecAddr points to the
starting address of the outputsection rather than the
starting address of the eh frame section within that
output section.
This issue affects all targets which generates .eh_frame
section. Hence fixing in all the corresponding targets it affecting.
As per the ABI at
https://github.com/ARM-software/abi-aa/blob/main/memtagabielf64/memtagabielf64.rst,
this patch interprets the SHT_AARCH64_MEMTAG_GLOBALS_STATIC section,
which contains R_NONE relocations to tagged globals, and emits a
SHT_AARCH64_MEMTAG_GLOBALS_DYNAMIC section, with the correct
DT_AARCH64_MEMTAG_GLOBALS and DT_AARCH64_MEMTAG_GLOBALSSZ dynamic
entries. This section describes, in a uleb-encoded stream, global memory
ranges that should be tagged with MTE.
We are also out of bits to spare in the LLD Symbol class. As a result,
I've reused the 'needsTocRestore' bit, which is a PPC64 only feature.
Now, it's also used for 'isTagged' on AArch64.
An entry in SHT_AARCH64_MEMTAG_GLOBALS_STATIC is practically a guarantee
from an objfile that all references to the linked symbol are through the
GOT, and meet correct alignment requirements. As a result, we go through
all symbols and make sure that, for all symbols $SYM, all object files
that reference $SYM also have a SHT_AARCH64_MEMTAG_GLOBALS_STATIC entry
for $SYM. If this isn't the case, we demote the symbol to being
untagged. Symbols that are imported from other DSOs should always be
fine, as they're GOT-referenced (and thus the GOT entry either has the
correct tag or not, depending on whether it's tagged in the defining DSO
or not).
Additionally hand-tested by building {libc, libm, libc++, libm, and libnetd}
on Android with some experimental MTE globals support in the
linker/libc.
Reviewed By: MaskRay, peter.smith
Differential Revision: https://reviews.llvm.org/D152921
With relative vtables the caller jumps directly to the plt entries in the shared object,
therefore landing pad is need for these entries.
Reproducer:
main.cpp
```
#include "v.hpp"
int main() {
A* a = new B();
a->do_something2();
return 0;
}
```
v.hpp
```
struct A {
virtual void do_something() = 0;
virtual void do_something2();
};
struct B : public A {
void do_something() override;
void do_something2() override;
};
```
v.cpp
```
#include "v.hpp"
void A::do_something2() { }
void B::do_something() { }
void B::do_something2() { }
```
```
CC="clang++ --target=aarch64-unknown-linux-gnu -fuse-ld=lld -mbranch-protection=bti"
F=-fexperimental-relative-c++-abi-vtables
${=CC} $F -shared v.cpp -o v.so -z force-bti
${=CC} $F main.cpp -L./ v.so -Wl,-rpath=. -z force-bti
qemu-aarch64-static -L /usr/aarch64-linux-gnu -cpu max ./a.out
```
For v.so, the regular vtable entry is relocated by an R_AARCH64_ABS64 relocation referencing _ZN1B13do_something2Ev.
```
_ZTV1B:
.xword _ZN1B13do_something2Ev
```
Using relative vtable entry for a DSO has a downside of creating many PLT entries and making their addresses escape.
The relative vtable entry references a PLT entry _ZN1B13do_something2Ev@plt.
```
.L_ZTV1A.local:
.word (_ZN1A13do_something2Ev@PLT-.L_ZTV1A.local)-8
```
fixes: #63580
Reviewed By: peter.smith, MaskRay
Differential Revision: https://reviews.llvm.org/D153264
Adding BTI to those PLT's which accessed with by a range extension thunk due to those preform an indirect call.
Fixes: #62140
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D148704
to prepare for changing `relocations` from a SmallVector to a pointer.
Also change the `isec` parameter in `addAddendOnlyRelocIfNonPreemptible` to `GotSection &`.
The target-specific code (AArch64, PPC64) does not fit into the generic code and
adds virtual function overhead. Move relocateAlloc into ELF/Arch/ instead. This
removes many virtual functions (relaxTls*). In addition, this helps get rid of
getRelocTargetVA dispatch and many RelExpr members in the future.