This allows NOCROSSREFS to be specified in OVERLAY linker script
descriptions. This is a particularly useful part of the OVERLAY syntax,
since it's very rarely possible for one overlay section to sensibly
reference another.
Closes#128790
This permits an AArch64AbsLongThunk to be used in an environment where
unaligned accesses are disabled.
The AArch64AbsLongThunk does a load of an 8-byte address. When unaligned
accesses are disabled this address must be 8-byte aligned.
The vast majority of AArch64 systems will have unaligned accesses
enabled in userspace. However, after a reset, before the MMU has been
enabled, all memory accesses are to "device" memory, which requires
aligned accesses. In systems with multi-stage boot loaders a thunk may
be required to a later stage before the MMU has been enabled.
As we only want to increase the alignment when the ldr is used we delay
the increase in thunk alignment until we know we are going to write an
ldr. We also need to account for the ThunkSection alignment increase
when this happens.
In some of the test updates, particularly those with shared CHECK lines
with position independent thunks it was easier to ensure that the thunks
started at an 8-byte aligned address in all cases.
This allows the contents of OVERLAYs to be attributed to memory regions.
This is the only clean way to overlap VMAs in linker scripts that choose
to primarily use memory regions to lay out addresses.
This also simplifies OVERLAY expansion to better match GNU LD.
Expressions for the first section's LMA and VMA are not generated if the
user did not provide them. This allows the LMA/VMA offset to be
preserved across multiple overlays in the same region, as with regular
sections.
Closes#129816
The value of an absolute relocation, like R_RISCV_HI20 or R_PPC64_LO16,
with a symbol index of 0, the resulting value should be treated as
absolute and permitted in both -pie and -shared links.
This change also resolves an absolute relocation referencing an
undefined symbol in statically-linked executables.
PPC64 has unfortunate exceptions:
* R_PPC64_TOCBASE uses symbol index 0 but it should be treated as
referencing the linker-defined .TOC.
* R_PPC64_PCREL_OPT (https://reviews.llvm.org/D84360) could no longer
rely on `isAbsoluteValue` return false.
This prints a stack of reasons that symbols that match the given glob(s)
survived GC. It has no effect unless section GC occurs.
This implementation does not require -ffunction-sections or
-fdata-sections to produce readable results, althought it does tend to
work better (as does GC).
Details about the semantics:
- Some chain of liveness reasons is reported; it isn't specified which
chain.
- A symbol or section may be live:
- Intrisically (e.g., entry point)
- Because needed by a live symbol or section
- (Symbols only) Because part of a section live for another reason
- (Sections only) Because they contain a live symbol
- Both global and local symbols (`STB_LOCAL`) are supported.
- References to symbol + offset are considered to point to:
- If the referenced symbol is a section (`STT_SECTION`):
- If a sized symbol encloses the referenced offset, the enclosing
symbol.
- Otherwise, the section itself, generically.
- Otherwise, the referenced symbol.
Set the default processor version to v68 when the user does not specify
one in the command line. This includes changes in the LLVM backed and
linker (lld). Since lld normally sets the version based on inputs, this
change will only affect cases when there are no inputs.
Fixes#127558
If we have an absolute address whose high bits are known to be a sign
extend of the low 12 bits, we can avoid emitting the LUI entirely. This
is implemented in an analogous manner to the gp relative relocations -
defining an internal usage relocation type.
Since 12 bits (really 11 since the high bit must be zero in user code)
is less than one page, all of these offsets fit in the null page. As
such, the only application of these is likely to be undefined weak
symbols except for embedded use cases. I'm mostly posting this for
completeness sake.
When GCS was introduced to LLD, the gcs-report option allowed for a user
to gain information relating to if their relocatable objects supported
the feature. For an executable or shared-library to support GCS, all
relocatable objects must declare that they support GCS.
The gcs-report checks were only done on relocatable object files,
however for a program to enable GCS, the executable and all shared
libraries that it loads must enable GCS. gcs-report-dynamic enables
checks to be performed on all shared objects loaded by LLD, and in cases
where GCS is not supported, a warning or error will be emitted.
It should be noted that only shared files directly passed to LLD are
checked for GCS support. Files that are noted in the `DT_NEEDED` tags
are assumed to have had their GCS support checked when they were
created.
The behaviour of the -zgcs-dynamic-report option matches that of GNU ld.
The behaviour is as follows unless the user explicitly sets the value:
* -zgcs-report=warning or -zgcs-report=error implies
-zgcs-report-dynamic=warning.
This approach avoids inheriting an error level if the user wishes to
continue building a module without rebuilding all the shared libraries.
The same approach was taken for the GNU ld linker, so behaviour is
identical across the toolchains.
This implementation matches the error message and command line interface
used within the GNU ld Linker. See here:
724a8341f6
To support this option being introduced, two other changes are included
as part of this PR. The first converts the -zgcs-report option to
utilise an Enum, opposed to StringRef values. This enables easier
tracking of the value the user defines when inheriting the value for the
gas-report-dynamic option. The second is to parse the Dynamic Objects
program headers to locate the GNU Attribute flag that shows GCS is
supported. This is needed so, when using the gcs-report-dynamic option,
LLD can correctly determine if a dynamic object supports GCS.
---------
Co-authored-by: Fangrui Song <i@maskray.me>
When attempting to add KEEP within an OVERLAY description, which the
Linux kernel would like to do for ARCH=arm to avoid dropping the
.vectors sections with '--gc-sections' [1], ld.lld errors with:
ld.lld: error: ./arch/arm/kernel/vmlinux.lds:37: section pattern is expected
>>> __vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors { KEEP(*(.vectors)) } ...
>>> ^
readOverlaySectionDescription() does not handle all input section
description keywords, despite GNU ld's documentation stating that "The
section definitions within the OVERLAY construct are identical to those
within the general SECTIONS construct, except that no addresses and no
memory regions may be defined for sections within an OVERLAY."
Reuse the existing parsing in readInputSectionDescription(), which
handles KEEP, allowing the Linux kernel's use case to work properly.
[1]: https://lore.kernel.org/20250221125520.14035-1-ceggers@arri.de/
This prevents useless spills to the same memory region from causing
spilling to take too many passes to converge.
Handling this at spilling time allows us to relax the generation of
spill sections; specifically, multiple spills can now be generated per
output section. This should be fairly benign for performance, and it
would eventually allow linker scripts to express things like holes or
minimum addresses for parts of output sections. The linker could then
spill within an output section whenever address constraints are
violated.
In local-exec form, the code sequence is converted as follows:
```
From:
lu12i.w $rd, %le_hi20_r(sym)
R_LARCH_TLS_LE_HI20_R, R_LARCH_RELAX
add.w/d $rd, $rd, $tp, %le_add_r(sym)
R_LARCH_TLS_LE_ADD_R, R_LARCH_RELAX
addi/ld/st.w/d $rd, $rd, %le_lo12_r(sym)
R_LARCH_TLS_LE_LO12_R, R_LARCH_RELAX
To:
addi/ld/st.w/d $rd, $tp, %le_lo12_r(sym)
R_LARCH_TLS_LE_LO12_R
```
In global-dynamic or local-dynamic, the code sequence is converted as
follows:
```
From:
pcalau12i $a0, %ld_pc_hi20(sym) | %gd_pc_hi20(sym)
R_LARCH_TLS_GD_PC_HI20 | R_LARCH_TLS_LD_PC_HI20, R_LARCH_RELAX
addi.w/d $a0, $a0, %got_pc_lo12(sym) | %got_pc_lo12(sym)
R_LARCH_GOT_PC_LO12, R_LARCH_RELAX
To:
pcaddi $a0, %got_pc_lo12(sym) | %got_pc_lo12(sym)
R_LARCH_TLS_GD_PCREL20_S2 | R_LARCH_TLS_LD_PCREL20_S2
```
Note: For initial-exec form, since it involves the conversion from IE to
LE, we will implement it in a future patch.
Instructions with relocation `R_LARCH_CALL36` may be relax as follows:
```
From:
pcaddu18i $dest, %call36(foo)
R_LARCH_CALL36, R_LARCH_RELAX
jirl $r, $dest, 0
To:
b/bl foo # bl if r=$ra, b if r=$zero
R_LARCH_B26
```
This patch fixes the buildbots failuer of lld tests.
Changes: Modify test files: from `sym@plt` to `%plt(sym)`.
Instructions with relocation `R_LARCH_CALL36` may be relax as follows:
```
From:
pcaddu18i $dest, %call36(foo)
R_LARCH_CALL36, R_LARCH_RELAX
jirl $r, $dest, 0
To:
b/bl foo # bl if r=$ra, b if r=$zero
R_LARCH_B26
```
`-z execute-only-report` checks that all executable sections have either
the SHF_AARCH64_PURECODE or SHF_ARM_PURECODE section flag set on AArch64
and ARM respectively.
There are considerable number of changes done in the address assignment
fixed point loop, and errors in any of them could cause address
assignment not to converge. However, this is reported to the user as
either "thunk creation not converged" or "relaxation not converged".
We saw a confused bug about this in the wild when spilling failed to
converge. (I'm working on a fix for that.)
We may eventually want a complete reason system when reporting address
assignment taking too many passes, but in the interim it seems prudent
to generalize the error message to "address assignment did not
converge".
These relocations apply to 16-bit Thumb instructions, so reading 16 bits
rather than 32 bits ensures the correct bits are masked and written
back. This fixes the incorrect masking and aligns the relocation logic
with the instruction encoding.
Before this patch, 32 bits were read from the ELF object. This did not
align with the instruction size of 16 bits, but the masking incidentally
made it all work nonetheless. However, this was the case only in little
endian.
In big endian mode, the read 32-bit word had to have its bytes reversed.
With this byte reordering, the masking would be applied to the wrong
bits, hence causing the incorrect encoding to be produced as a result of
the relocation resolution.
The added test checks the result for both little and big endian modes.
--export-dynamic should be a no-op when ctx.hasDynsym is false.
* Drop unneeded ctx.hasDynsym checks.
* Static linking with --export-dynamic does not prevent devirtualization.
ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems
that the original sections are marked as not live, but are still kept
around. Prior to this patch, those ICF'ed sections would be passed to BP
and ordered before being skipped when writing the output. Now, these
sections are no longer passed to BP, saving runtime and possibly
improving BP's output.
In a large binary, I found that the number of sections ordered using BP
decreased, while the number of duplicate sections drastically decreased
as expected.
```
Functions for startup: 50755 -> 50520
Functions for compression: 165734 -> 105328
Duplicate functions: 1827231 -> 55230
```
(This application-specific option is probably not appropriate as a
linker option (.o file offers more flexibility and decouples JSON
verification from linkers). However, the option has gained some traction
in Linux distributions, with support in GNU ld, gold, and mold.)
GNU ld has supported percent-encoded bytes and extensions like
`%[comma]` since November 2024. mold supports just percent-encoded
bytes. To prepare for potential adoption by Ubuntu, let's support
percent-encoded bytes.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=32003
Link: https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/2071468
Pull Request: https://github.com/llvm/llvm-project/pull/126396
Reland 994cea3f0a2d0caf4d66321ad5a06ab330144d89 after bolt tests no
longer rely on -pie --unresolved-symbols=ignore-all with no input DSO
generating PLT entries.
---
Commit f10441ad003236ef3b9e5415a571d2be0c0ce5ce , while dropping a
special case for isUndefWeak and --no-dynamic-linking, made
--export-dynamic ineffective when -pie is used without any input DSO.
This change restores --export-dynamic and unifies -pie and -pie
--no-dynamic-linker when there is no input DSO.
* -pie with no input DSO suppresses undefined symbols in .dynsym.
Previously this only appied to -pie --no-dynamic-linker.
* As a side effect, -pie with no input DSO suppresses PLT.
LLVM has started to emit AArch64 build attributes sections called
.ARM.attributes. LLD does not yet have support for these so they are
accumulating in the ELF output. As the first part of that support
discard all the .ARM.attributes sections. This can be built upon by the
full implementation in LLD.
The build attributes specification only defines build attributes for
relocatable objects. The intention for LLD is that files of type ET_EXEC
and ET_SHARED will not have a build attributes in the output. A
relocatable link with -r will need a merged build attributes, but until
the merge is implemented it is better to discard.
Reland #120514 after 2f6e3df08a8b7cd29273980e47310cf09c6fdbd8 fixed
iteration order issue and libstdc++/libc++ differences.
---
Both options instruct the linker to optimize section layout with the
following goals:
* `--bp-compression-sort=[data|function|both]`: Improve Lempel-Ziv
compression by grouping similar sections together, resulting in a
smaller compressed app size.
* `--bp-startup-sort=function --irpgo-profile=<file>`: Utilize a
temporal profile file to reduce page faults during program startup.
The linker determines the section order by considering three groups:
* Function sections ordered according to the temporal profile
(`--irpgo-profile=`), prioritizing early-accessed and frequently
accessed functions.
* Function sections. Sections containing similar functions are placed
together, maximizing compression opportunities.
* Data sections. Similar data sections are placed together.
Within each group, the sections are ordered using the Balanced
Partitioning algorithm.
The linker constructs a bipartite graph with two sets of vertices:
sections and utility vertices.
* For profile-guided function sections:
+ The number of utility vertices is determined by the symbol order
within the profile file.
+ If `--bp-compression-sort-startup-functions` is specified, extra
utility vertices are allocated to prioritize nearby function similarity.
* For sections ordered for compression: Utility vertices are determined
by analyzing k-mers of the section content and relocations.
The call graph profile is disabled during this optimization.
When `--symbol-ordering-file=` is specified, sections described in that
file are placed earlier.
Co-authored-by: Pengying Xu <xpy66swsry@gmail.com>
The ELF/bp-section-orderer.s test is failing on some buildbots due to
what seems like non-determinism issues, see comments on the original PR
and #125450
Reverting to green the build.
This reverts commit 0154dce8d39d2688b09f4e073fe601099a399365 and
follow-up commits 046dd4b28b9c1a75a96cf63465021ffa9fe1a979 and
c92f20416e6dbbde9790067b80e75ef1ef5d0fa4.
Add new ELF linker options for profile-guided section ordering
optimizations:
- `--irpgo-profile=<file>`: Read IRPGO profile data for use with startup
and compression optimizations
- `--bp-startup-sort={none,function}`: Order sections based on profile
data to improve star tup time
- `--bp-compression-sort={none,function,data,both}`: Order sections
using balanced partitioning to improve compressed size
- `--bp-compression-sort-startup-functions`: Additionally optimize
startup functions for compression
- `--verbose-bp-section-orderer`: Print statistics about balanced
partitioning section ordering
Thanks to the @ellishg, @thevinster, and their team's work.
---------
Co-authored-by: Fangrui Song <i@maskray.me>
Commit f10441ad003236ef3b9e5415a571d2be0c0ce5ce dropped a special case
for isUndefWeak and --no-dynamic-linking but also made --export-dynamic
ineffective for static PIE.
This change restores the --export-dynamic behavior and entirely drops
special handling of --no-dynamic-linker:
* -pie with no input DSO, similar to --no-dynamic-linker, suppresses
undefined symbols in .dynsym
The new behaviors resemble GNU ld more.
Commit 3733ed6f1c6b0eef1e13e175ac81ad309fc0b080 introduced isExported to
cache includeInDynsym. If we don't unnecessarily set isExported for
undefined symbols, exportDynamic/includeInDynsym can be replaced with
isExported.
Similar to the change to MarkLive.cpp when isExported was introduced.
includeInDynsym might return true even when isExported is false for
statically linked executables.
Commit 2a26292388fcab0c857c91b2d08074c33abd37e8 made `isExported`
accurate except a few linker-synthesized symbols in finalizeSections.
We can collect these linker-synthesized symbols into a vector
and avoid recomputation for other symbols.
This is reland of 1a4d6de1b532149b10522eae5dabce39e5f7c687 after
`isExported` has been made accurate by f10441ad003236ef3b9e5415a571d2be0c0ce5ce
`includeInDynsym` has a special case for isUndefWeak and
--no-dynamic-linker, which can be removed if we simplify disallow
dynamic symbols for static-pie.
The partition feature reports errors only when a symbol `isExported`.
We need to link in a DSO to trigger the mips error.
This reverts commit 048f35037779763963c4b4478a0884e828ea9538.
This reverts commit f7bbc40b0736cc417f57cd039b098b504cf6a71f.
Related to #95949. A developer with no prior lld contribution and very
little AMD contribution sneaked in these application-specific section
order rules we discourage.
Commit 2a26292388fcab0c857c91b2d08074c33abd37e8 made `isExported`
accurate except a few linker-synthesized symbols in finalizeSections.
We can collect these linker-synthesized symbols into a vector
and avoid recomputation for other symbols.
LTO compilation might define symbols not in the symbol table (e.g.
__emutls_v.x in test/ELF/lto/wrap-unreferenced-before-codegen.test).
These symbols have a false `isExported` until
`demoteSymbolsAndComputeIsPreemptible`. This is usually benign as we do
not reference `isExported` that early.
Ensure that `isExported` is correct early. This helps remove a redundant
`isExported` computation in `demoteSymbolsAndComputeIsPreemptible`.
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change the Fatal to
ErrAlways (not-recoverable) as subsequent code assumes SHF_LINK_ORDER
sh_link is correct.