The Global Function Merger
(https://discourse.llvm.org/t/rfc-global-function-merging/82608) pass
optimistically creates merged instances of functions and suffixes their
names with `.Tgm`. Then in the linker, ICF will (hopefully) fold these
`.Tgm` functions. For example, a function `foo` might become a thunk
`foo` that calls a merged function `foo.Tgm`.
Since IRPGO runs before the global merger, we will only have a profile
for `foo`. We want to correlate this profile to both `foo` and `foo.Tgm`
so they can both be ordered to improve startup time.
I built a large binary and found that it increased the number of
functions ordered for startup, as expected.
```
Functions for startup: 12049 -> 12697
Functions for compression: 34733 -> 34707
```
The reason why we don't see a larger improvement is because there are
some cases where the code was accidentally working:
`getRootSymbol("foo.llvm.5555.Tgm")` already returns `foo`.
## Problem
The `safe_thunks` ICF optimization in `lld-macho` was creating thunks
that pointed to `InputSection`s instead of `Symbol`s. While, generally,
branch relocations can point to symbols or input sections, in this case
we need them to point to symbols as subsequently the branch extension
algorithm expects branches to always point to `Symbol`'s.
## Solution
This patch changes the ICF implementation so that safe thunks point to
`Symbol`'s rather than `InputSection`s.
## Testing
The existing `arm64-thunks.s` test is modified to include
`--icf=safe_thunks` to explicitly verify the interaction between ICF and
branch range extension thunks. Two functions were added that will be
merged together via a thunk. Before this patch, this test would generate
an assert - now this scenario is correctly handled.
ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems
that the original sections are marked as not live, but are still kept
around. Prior to this patch, those ICF'ed sections would be passed to BP
and ordered before being skipped when writing the output. Now, these
sections are no longer passed to BP, saving runtime and possibly
improving BP's output.
In a large binary, I found that the number of sections ordered using BP
decreased, while the number of duplicate sections drastically decreased
as expected.
```
Functions for startup: 50755 -> 50520
Functions for compression: 165734 -> 105328
Duplicate functions: 1827231 -> 55230
```
`%T` is not unique and deprecated
[[1](https://llvm.org/docs/CommandGuide/lit.html#substitutions)].
This patch replaces all `%T` in `lld/test` with `%t.dir` (`mkdir` if
necessary)
---------
Signed-off-by: Peter Rong <PeterRong@meta.com>
(This application-specific option is probably not appropriate as a
linker option (.o file offers more flexibility and decouples JSON
verification from linkers). However, the option has gained some traction
in Linux distributions, with support in GNU ld, gold, and mold.)
GNU ld has supported percent-encoded bytes and extensions like
`%[comma]` since November 2024. mold supports just percent-encoded
bytes. To prepare for potential adoption by Ubuntu, let's support
percent-encoded bytes.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=32003
Link: https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/2071468
Pull Request: https://github.com/llvm/llvm-project/pull/126396
Reland 994cea3f0a2d0caf4d66321ad5a06ab330144d89 after bolt tests no
longer rely on -pie --unresolved-symbols=ignore-all with no input DSO
generating PLT entries.
---
Commit f10441ad003236ef3b9e5415a571d2be0c0ce5ce , while dropping a
special case for isUndefWeak and --no-dynamic-linking, made
--export-dynamic ineffective when -pie is used without any input DSO.
This change restores --export-dynamic and unifies -pie and -pie
--no-dynamic-linker when there is no input DSO.
* -pie with no input DSO suppresses undefined symbols in .dynsym.
Previously this only appied to -pie --no-dynamic-linker.
* As a side effect, -pie with no input DSO suppresses PLT.
LLVM has started to emit AArch64 build attributes sections called
.ARM.attributes. LLD does not yet have support for these so they are
accumulating in the ELF output. As the first part of that support
discard all the .ARM.attributes sections. This can be built upon by the
full implementation in LLD.
The build attributes specification only defines build attributes for
relocatable objects. The intention for LLD is that files of type ET_EXEC
and ET_SHARED will not have a build attributes in the output. A
relocatable link with -r will need a merged build attributes, but until
the merge is implemented it is better to discard.
Reland #120514 after 2f6e3df08a8b7cd29273980e47310cf09c6fdbd8 fixed
iteration order issue and libstdc++/libc++ differences.
---
Both options instruct the linker to optimize section layout with the
following goals:
* `--bp-compression-sort=[data|function|both]`: Improve Lempel-Ziv
compression by grouping similar sections together, resulting in a
smaller compressed app size.
* `--bp-startup-sort=function --irpgo-profile=<file>`: Utilize a
temporal profile file to reduce page faults during program startup.
The linker determines the section order by considering three groups:
* Function sections ordered according to the temporal profile
(`--irpgo-profile=`), prioritizing early-accessed and frequently
accessed functions.
* Function sections. Sections containing similar functions are placed
together, maximizing compression opportunities.
* Data sections. Similar data sections are placed together.
Within each group, the sections are ordered using the Balanced
Partitioning algorithm.
The linker constructs a bipartite graph with two sets of vertices:
sections and utility vertices.
* For profile-guided function sections:
+ The number of utility vertices is determined by the symbol order
within the profile file.
+ If `--bp-compression-sort-startup-functions` is specified, extra
utility vertices are allocated to prioritize nearby function similarity.
* For sections ordered for compression: Utility vertices are determined
by analyzing k-mers of the section content and relocations.
The call graph profile is disabled during this optimization.
When `--symbol-ordering-file=` is specified, sections described in that
file are placed earlier.
Co-authored-by: Pengying Xu <xpy66swsry@gmail.com>
The ELF/bp-section-orderer.s test is failing on some buildbots due to
what seems like non-determinism issues, see comments on the original PR
and #125450
Reverting to green the build.
This reverts commit 0154dce8d39d2688b09f4e073fe601099a399365 and
follow-up commits 046dd4b28b9c1a75a96cf63465021ffa9fe1a979 and
c92f20416e6dbbde9790067b80e75ef1ef5d0fa4.
Add new ELF linker options for profile-guided section ordering
optimizations:
- `--irpgo-profile=<file>`: Read IRPGO profile data for use with startup
and compression optimizations
- `--bp-startup-sort={none,function}`: Order sections based on profile
data to improve star tup time
- `--bp-compression-sort={none,function,data,both}`: Order sections
using balanced partitioning to improve compressed size
- `--bp-compression-sort-startup-functions`: Additionally optimize
startup functions for compression
- `--verbose-bp-section-orderer`: Print statistics about balanced
partitioning section ordering
Thanks to the @ellishg, @thevinster, and their team's work.
---------
Co-authored-by: Fangrui Song <i@maskray.me>
Commit f10441ad003236ef3b9e5415a571d2be0c0ce5ce dropped a special case
for isUndefWeak and --no-dynamic-linking but also made --export-dynamic
ineffective for static PIE.
This change restores the --export-dynamic behavior and entirely drops
special handling of --no-dynamic-linker:
* -pie with no input DSO, similar to --no-dynamic-linker, suppresses
undefined symbols in .dynsym
The new behaviors resemble GNU ld more.
Similar to the change to MarkLive.cpp when isExported was introduced.
includeInDynsym might return true even when isExported is false for
statically linked executables.
The rule here, which I'm copying from the ELF linker, is that shared
library symbols should take presence, unless the symbol has already be
extracted from the archive. e.g:
```
$ wasm-ld foo.a foo.so ref.o // .so wins
$ wasm-ld foo.a ref.o foo.so // .a wins
```
In the first case the shared library takes precedence because the lazy
symbol is replaced by the .so symbol before it is extracted from the
archive. In the second example the ref.o file causes the archive to be
exracted before the .so file is processed, so in that case the archive
file wins.
Fixes: https://github.com/emscripten-core/emscripten/issues/23501
For each imported module, emit null-terminated native import entries,
followed by null-terminated EC entries. If a view lacks imports for a
given module, only terminators are emitted. Use ARM64X relocations to
skip native entries in the EC view.
Move `delayLoadHelper` and `tailMergeUnwindInfoChunk` to `SymbolTable`
since they are different for each symbol table.
The reverted
1a4d6de1b532149b10522eae5dabce39e5f7c687
("[ELF] Remove redundant isExported computation")
had incorrect
```
+ if (sym->includeInDynsym(ctx))
+ sym->isExported = true;
```
causing undefined weak symbols (defined in archives, demoted; e.g.
__cxa_finalize) to be exported for static-pie.
Add a regression test for this corner case. The issue actually exposed
another issue related to includeInDynsym, which has been fixed by
f10441ad003236ef3b9e5415a571d2be0c0ce5ce.
`includeInDynsym` has a special case for isUndefWeak and
--no-dynamic-linker, which can be removed if we simplify disallow
dynamic symbols for static-pie.
The partition feature reports errors only when a symbol `isExported`.
We need to link in a DSO to trigger the mips error.
This mode does not retain definitions in GNU ld. While we do, it's not
consistent with the decision that there is no .dynsym . We will change
this and simplify some internal representations.
This enables the LLD_IN_TEST=2 testing mode for
```
path/to/llvm-lit -sv --param RUN_LLD_MAIN_TWICE=1 lld/test/ELF
```
When `Fatal` is called, `RunSafely` will return false.
For the first invocation in LLD_IN_TEST=2 mode, `inTestOutputDisabled`
is true and lld will not write to stdout/stderr, making many tests fail.
(This essentially discourages `Fatal` calls in the source code.)
Add XFAIL: main-run-twice to these tests similar to
https://reviews.llvm.org/D112898 for Mach-O
```
comment="This test intentionally checks for fatal errors, and fatal errors aren't supported for testing when main is run twice."
xargs </tmp/0 sed -Ei "1s/(;|#|\/\/) REQUIRES: .*/\0\n\1 "$comment"\n\1 XFAIL: main-run-twice/;t;1s/^/# "$comment"\n# XFAIL: main-run-twice\n/"
```
This reverts commit 048f35037779763963c4b4478a0884e828ea9538.
This reverts commit f7bbc40b0736cc417f57cd039b098b504cf6a71f.
Related to #95949. A developer with no prior lld contribution and very
little AMD contribution sneaked in these application-specific section
order rules we discourage.
In hybrid images, the PE header references a single IAT for both native
and EC views, merging entries where possible. When merging isn't
feasible, different imports are grouped together, and ARM64X relocations
are emitted as needed.
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change the Fatal to
ErrAlways (not-recoverable) as subsequent code assumes SHF_LINK_ORDER
sh_link is correct.
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
These diagnostics are mostly reported by a thread during writeSections.
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1.
`NullChunk` instances do write data, even if it's always zero. Setting
`hasData` to false causes `Writer::assignAddresses` to ignore them
when calculating `rawSize`. This typically isn't an issue, as null chunks
are usually positioned within a section, and later chunks adjust the
size accordingly.
However, on ARM64EC, the auxiliary IAT is placed at the end of the
`.rdata` section and terminates with a null chunk. As a result, `rawSize`
is never updated to account for it, and space for the null chunk is not
allocated. Consequently, when `NullChunk::writeTo` is called, it receives
an invalid pointer - either pointing to the next section or beyond the
allocated buffer.
This is a follow-up to #120452 in a way.
Since lld/COFF does not yet insert all defined in an obj file before all
undefineds (ELF and MachO do this, see #67445 and things linked from
there), it's possible that:
1. We add an obj file a.obj
2. a.obj contains an undefined that's in b.obj, causing b.obj to be
added
3. b.obj contains an undefined that's in a part of a.obj that's not yet
in the symbol table, causing a recursive load of a.obj, which adds the
symbols in there twice, leading to duplicate symbol errors.
For normal archives, `ArchiveFile::addMember()` has a `seen` check to
prevent this. For start-lib lazy objects, we can just check if the
archive is still lazy at the recursive call.
This bug is similar to issue #59162.
(Eventually, we'll probably want to do what the MachO and ELF ports do.)
Includes a test that caused duplicate symbol diagnostics before this
code change.
The symbol string table does not have deduplication.
Here we add code to deduplicate the symbol string table.
This has a rather large size impact (20-30%) on unstripped binaries
(typically debug binaries) but no size impact on stripped
binaries(typically release binaries).
We enable deduplication by default and add a flag to disable it
(`-no-deduplicate-symbol-strings`).