8732 Commits

Author SHA1 Message Date
Ellis Hoag
83632c039d
[lld][BP] Order .Tgm symbols for startup (#126328)
The Global Function Merger
(https://discourse.llvm.org/t/rfc-global-function-merging/82608) pass
optimistically creates merged instances of functions and suffixes their
names with `.Tgm`. Then in the linker, ICF will (hopefully) fold these
`.Tgm` functions. For example, a function `foo` might become a thunk
`foo` that calls a merged function `foo.Tgm`.

Since IRPGO runs before the global merger, we will only have a profile
for `foo`. We want to correlate this profile to both `foo` and `foo.Tgm`
so they can both be ordered to improve startup time.

I built a large binary and found that it increased the number of
functions ordered for startup, as expected.
```
Functions for startup: 12049 -> 12697
Functions for compression: 34733 -> 34707
```

The reason why we don't see a larger improvement is because there are
some cases where the code was accidentally working:
`getRootSymbol("foo.llvm.5555.Tgm")` already returns `foo`.
2025-02-13 12:10:58 -08:00
alx32
4ac79a8c98
[lld-macho] Use Symbols as branch target for safe_thunks ICF (#126835)
## Problem

The `safe_thunks` ICF optimization in `lld-macho` was creating thunks
that pointed to `InputSection`s instead of `Symbol`s. While, generally,
branch relocations can point to symbols or input sections, in this case
we need them to point to symbols as subsequently the branch extension
algorithm expects branches to always point to `Symbol`'s.

## Solution
This patch changes the ICF implementation so that safe thunks point to
`Symbol`'s rather than `InputSection`s.

## Testing
The existing `arm64-thunks.s` test is modified to include
`--icf=safe_thunks` to explicitly verify the interaction between ICF and
branch range extension thunks. Two functions were added that will be
merged together via a thunk. Before this patch, this test would generate
an assert - now this scenario is correctly handled.
2025-02-13 11:07:12 -08:00
Jacek Caban
8252e0ef82
[LLD][COFF] Emit ARM64X relocations for CHPE ExtraRFETable entries (#126713)
In the native view, ExtraEFRTable references the x86 exception table.
The EC view references the ARM exception table, as it did before this
change.
2025-02-13 19:22:57 +01:00
Jacek Caban
c52fbabc93
[LLD][COFF] Set __buildid symbol in both symbol tables on ARM64X (#126777) 2025-02-13 19:10:46 +01:00
Ellis Hoag
79fff6aa32
[lld][BP] Avoid ordering ICF'ed sections (#126327)
ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems
that the original sections are marked as not live, but are still kept
around. Prior to this patch, those ICF'ed sections would be passed to BP
and ordered before being skipped when writing the output. Now, these
sections are no longer passed to BP, saving runtime and possibly
improving BP's output.

In a large binary, I found that the number of sections ordered using BP
decreased, while the number of duplicate sections drastically decreased
as expected.
```
Functions for startup: 50755 -> 50520
Functions for compression: 165734 -> 105328
Duplicate functions: 1827231 -> 55230
```
2025-02-13 08:57:44 -08:00
Jacek Caban
94d956367e
[LLD][MinGW] Add support for wrapped symbols on ARM64X (#126296)
Apply `-wrap` arguments to both symbol tables.
2025-02-10 22:52:11 +01:00
Jacek Caban
fdded8537d [LLD][COFF] Fix a typo in REQUIRES directive (NFC)
Fixes #126300.
2025-02-10 22:09:33 +01:00
Jacek Caban
6536579d80
[LLD][COFF] Add support for -includeoptional on ARM64X (#126300)
Include symbols from both symbol tables.
2025-02-10 22:01:53 +01:00
Peter Rong
839002dd2d
[lld] Remove usage of %T in lld/test (#126133)
`%T` is not unique and deprecated
[[1](https://llvm.org/docs/CommandGuide/lit.html#substitutions)].

This patch replaces all `%T` in `lld/test` with `%t.dir` (`mkdir` if
necessary)

---------

Signed-off-by: Peter Rong <PeterRong@meta.com>
2025-02-10 10:35:44 -08:00
Fangrui Song
0a470a9264
[ELF] --package-metadata: support %[0-9a-fA-F][0-9a-fA-F]
(This application-specific option is probably not appropriate as a
linker option (.o file offers more flexibility and decouples JSON
verification from linkers). However, the option has gained some traction
in Linux distributions, with support in GNU ld, gold, and mold.)

GNU ld has supported percent-encoded bytes and extensions like
`%[comma]` since November 2024.  mold supports just percent-encoded
bytes.  To prepare for potential adoption by Ubuntu, let's support
percent-encoded bytes.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=32003
Link: https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/2071468

Pull Request: https://github.com/llvm/llvm-project/pull/126396
2025-02-10 09:21:31 -08:00
Fangrui Song
10ed0e4065 [ELF] Reorder target-specific error messaes 2025-02-08 16:36:46 -08:00
Jacek Caban
abd31b48e3
[LLD][MinGW] Exclude load config symbols from auto-export (#126134) 2025-02-07 15:52:39 +01:00
Jacek Caban
c8b2ba722f
[LLD][COFF] Add test for hybrid patchable thunks on ARM64X (NFC) (#126135) 2025-02-07 15:51:38 +01:00
Jacek Caban
f729477657
[LLD][COFF] Add support for MinGW auto-export on ARM64X (#125862)
Export all symbols from both EC and native symbol tables. If an explicit
export is present in either symbol table, auto-export is disabled for
both.
2025-02-06 22:25:53 +01:00
Fangrui Song
52fc6ffcda [ELF] Refine isExported/isPreemptible condition
Reland 994cea3f0a2d0caf4d66321ad5a06ab330144d89 after bolt tests no
longer rely on -pie --unresolved-symbols=ignore-all with no input DSO
generating PLT entries.

---

Commit f10441ad003236ef3b9e5415a571d2be0c0ce5ce , while dropping a
special case for isUndefWeak and --no-dynamic-linking, made
--export-dynamic ineffective when -pie is used without any input DSO.

This change restores --export-dynamic and unifies -pie and -pie
--no-dynamic-linker when there is no input DSO.

* -pie with no input DSO suppresses undefined symbols in .dynsym.
  Previously this only appied to -pie --no-dynamic-linker.
* As a side effect, -pie with no input DSO suppresses PLT.
2025-02-05 21:16:00 -08:00
Peter Smith
ba476d0b83
[LLD][ELF][AArch64] Discard .ARM.attributes sections (#125838)
LLVM has started to emit AArch64 build attributes sections called
.ARM.attributes. LLD does not yet have support for these so they are
accumulating in the ELF output. As the first part of that support
discard all the .ARM.attributes sections. This can be built upon by the
full implementation in LLD.

The build attributes specification only defines build attributes for
relocatable objects. The intention for LLD is that files of type ET_EXEC
and ET_SHARED will not have a build attributes in the output. A
relocatable link with -r will need a merged build attributes, but until
the merge is implemented it is better to discard.
2025-02-05 14:04:21 -05:00
Jacek Caban
e596387ebe
[LLD][COFF] Use EC symbol table for output DEF file on ARM64X (#125531)
For consistency with input def handling.
2025-02-05 12:16:02 +01:00
Jacek Caban
8cb3d7b418
[LLD][COFF] Emit locally imported EC symbols for ARM64X (#125527) 2025-02-05 10:30:13 +01:00
Fangrui Song
6ab034b828
[ELF] Add BPSectionOrderer options (#125559)
Reland #120514 after 2f6e3df08a8b7cd29273980e47310cf09c6fdbd8 fixed
iteration order issue and libstdc++/libc++ differences.

---

Both options instruct the linker to optimize section layout with the
following goals:

* `--bp-compression-sort=[data|function|both]`: Improve Lempel-Ziv
compression by grouping similar sections together, resulting in a
smaller compressed app size.
* `--bp-startup-sort=function --irpgo-profile=<file>`: Utilize a
temporal profile file to reduce page faults during program startup.

The linker determines the section order by considering three groups:

* Function sections ordered according to the temporal profile
(`--irpgo-profile=`), prioritizing early-accessed and frequently
accessed functions.
* Function sections. Sections containing similar functions are placed
together, maximizing compression opportunities.
* Data sections. Similar data sections are placed together.

Within each group, the sections are ordered using the Balanced
Partitioning algorithm.

The linker constructs a bipartite graph with two sets of vertices:
sections and utility vertices.

* For profile-guided function sections:
  + The number of utility vertices is determined by the symbol order
within the profile file.
  + If `--bp-compression-sort-startup-functions` is specified, extra
utility vertices are allocated to prioritize nearby function similarity.
* For sections ordered for compression: Utility vertices are determined
by analyzing k-mers of the section content and relocations.

The call graph profile is disabled during this optimization.

When `--symbol-ordering-file=` is specified, sections described in that
file are placed earlier.

Co-authored-by: Pengying Xu <xpy66swsry@gmail.com>
2025-02-04 09:12:32 -08:00
Hans Wennborg
f3c4b58f4b Revert "[ELF] Add BPSectionOrderer options (#120514)"
The ELF/bp-section-orderer.s test is failing on some buildbots due to
what seems like non-determinism issues, see comments on the original PR
and #125450

Reverting to green the build.

This reverts commit 0154dce8d39d2688b09f4e073fe601099a399365 and
follow-up commits 046dd4b28b9c1a75a96cf63465021ffa9fe1a979 and
c92f20416e6dbbde9790067b80e75ef1ef5d0fa4.
2025-02-03 11:41:23 +01:00
Nikita Popov
b84f7d17f8 Revert "[ELF] Refine isExported/isPreemptible condition"
This reverts commit 994cea3f0a2d0caf4d66321ad5a06ab330144d89.

Try to fix the bolt test failures in pre-merge checks.
2025-02-03 10:01:53 +01:00
Pengying Xu
0154dce8d3
[ELF] Add BPSectionOrderer options (#120514)
Add new ELF linker options for profile-guided section ordering
optimizations:

- `--irpgo-profile=<file>`: Read IRPGO profile data for use with startup
and compression optimizations
- `--bp-startup-sort={none,function}`: Order sections based on profile
data to improve star tup time
- `--bp-compression-sort={none,function,data,both}`: Order sections
using balanced partitioning to improve compressed size
- `--bp-compression-sort-startup-functions`: Additionally optimize
startup functions for compression
- `--verbose-bp-section-orderer`: Print statistics about balanced
partitioning section ordering

Thanks to the @ellishg, @thevinster, and their team's work.

---------

Co-authored-by: Fangrui Song <i@maskray.me>
2025-02-02 17:33:19 -08:00
Fangrui Song
994cea3f0a [ELF] Refine isExported/isPreemptible condition
Commit f10441ad003236ef3b9e5415a571d2be0c0ce5ce dropped a special case
for isUndefWeak and --no-dynamic-linking but also made --export-dynamic
ineffective for static PIE.

This change restores the --export-dynamic behavior and entirely drops
special handling of --no-dynamic-linker:

* -pie with no input DSO, similar to --no-dynamic-linker, suppresses
  undefined symbols in .dynsym

The new behaviors resemble GNU ld more.
2025-01-31 20:37:18 -08:00
Fangrui Song
45f538ecba [ELF] ICF: replace includeInDynsym with isExported
Similar to the change to MarkLive.cpp when isExported was introduced.
includeInDynsym might return true even when isExported is false for
statically linked executables.
2025-01-30 19:03:38 -08:00
quic-areg
61ea63baaf
[Hexagon] Add support for decoding PLT symbols (#123425)
Describes PLT entries for hexagon.
2025-01-29 15:37:23 -06:00
Jacek Caban
e902cf2df1
[LLD][COFF] Write both native and EC export symbols to the import library on ARM64X (#124833) 2025-01-29 09:57:11 +01:00
Sam Clegg
617278e7b0
[lld][WebAssembly] Fix for shared library symbols WRT replacing lazy symbols (#124619)
The rule here, which I'm copying from the ELF linker, is that shared
library symbols should take presence, unless the symbol has already be
extracted from the archive. e.g:

```
$ wasm-ld foo.a foo.so ref.o  // .so wins
$ wasm-ld foo.a ref.o foo.so  // .a wins
```

In the first case the shared library takes precedence because the lazy
symbol is replaced by the .so symbol before it is extracted from the
archive. In the second example the ref.o file causes the archive to be
exracted before the .so file is processed, so in that case the archive
file wins.

Fixes: https://github.com/emscripten-core/emscripten/issues/23501
2025-01-28 16:06:01 -08:00
Jacek Caban
3a51466caf
[LLD][COFF] Add support for delay-load imports on ARM64X (#124600)
For each imported module, emit null-terminated native import entries,
followed by null-terminated EC entries. If a view lacks imports for a
given module, only terminators are emitted. Use ARM64X relocations to
skip native entries in the EC view.

Move `delayLoadHelper` and `tailMergeUnwindInfoChunk` to `SymbolTable`
since they are different for each symbol table.
2025-01-28 13:06:01 +01:00
Fangrui Song
5e43dd5bde [test] Add missing -triple=x86_64
Fixes 7109f521975e9cc2e8ba4f52ac2a8e1140bd49b5
2025-01-27 22:38:04 -08:00
Fangrui Song
8f8a640e9a [ELF,test] Test static-pie __global_pointer$ 2025-01-27 22:33:49 -08:00
Fangrui Song
952685a43d [ELF,test] Add static-pie test related to demoted lazy symbol
The reverted
1a4d6de1b532149b10522eae5dabce39e5f7c687
("[ELF] Remove redundant isExported computation")
had incorrect
```
+        if (sym->includeInDynsym(ctx))
+          sym->isExported = true;
```

causing undefined weak symbols (defined in archives, demoted; e.g.
__cxa_finalize) to be exported for static-pie.

Add a regression test for this corner case. The issue actually exposed
another issue related to includeInDynsym, which has been fixed by
f10441ad003236ef3b9e5415a571d2be0c0ce5ce.
2025-01-27 22:19:48 -08:00
Fangrui Song
f10441ad00 [ELF] Refine includeInDynsym condition
`includeInDynsym` has a special case for isUndefWeak and
--no-dynamic-linker, which can be removed if we simplify disallow
dynamic symbols for static-pie.

The partition feature reports errors only when a symbol `isExported`.
We need to link in a DSO to trigger the mips error.
2025-01-27 22:02:27 -08:00
Fangrui Song
7109f52197 [ELF,test] Don't rely on --export-dynamic --gc-sections behavior for non-pie static linking
This mode does not retain definitions in GNU ld. While we do, it's not
consistent with the decision that there is no .dynsym . We will change
this and simplify some internal representations.
2025-01-27 20:55:00 -08:00
Fangrui Song
754b94638e
[lld] Support RUN_LLD_MAIN_TWICE for the ELF port (#124441)
This enables the LLD_IN_TEST=2 testing mode for
```
path/to/llvm-lit -sv --param RUN_LLD_MAIN_TWICE=1 lld/test/ELF
```

When `Fatal` is called, `RunSafely` will return false.
For the first invocation in LLD_IN_TEST=2 mode, `inTestOutputDisabled`
is true and lld will not write to stdout/stderr, making many tests fail.
(This essentially discourages `Fatal` calls in the source code.)

Add XFAIL: main-run-twice to these tests similar to
https://reviews.llvm.org/D112898 for Mach-O

```
comment="This test intentionally checks for fatal errors, and fatal errors aren't supported for testing when main is run twice."
xargs </tmp/0 sed -Ei "1s/(;|#|\/\/) REQUIRES: .*/\0\n\1 "$comment"\n\1 XFAIL: main-run-twice/;t;1s/^/# "$comment"\n# XFAIL: main-run-twice\n/"
```
2025-01-27 10:04:57 -08:00
Fangrui Song
b9efbed468 Revert "Move HIP fatbin sections farther away from .text"
This reverts commit 048f35037779763963c4b4478a0884e828ea9538.
This reverts commit f7bbc40b0736cc417f57cd039b098b504cf6a71f.

Related to #95949. A developer with no prior lld contribution and very
little AMD contribution sneaked in these application-specific section
order rules we discourage.
2025-01-26 21:14:49 -08:00
Fangrui Song
0e6b58202c [ELF] Improve parseSymbolVersion tests in for compileBitcodeFiles
Otherwise, the tests won't catch a mistake that removes
`parseSymbolVersion`.
2025-01-26 20:20:51 -08:00
Jacek Caban
80ab237c11 [LLD][COFF] Add REQUIRE x86 to arm64x-import.test (NFC)
This ensures the disassembler can handle ARM64X binaries correctly. Fixes #124189.
2025-01-26 22:36:01 +01:00
Jacek Caban
fb01a28903
[LLD][COFF] Implement support for hybrid IAT on ARM64X (#124189)
In hybrid images, the PE header references a single IAT for both native
and EC views, merging entries where possible. When merging isn't
feasible, different imports are grouped together, and ARM64X relocations
are emitted as needed.
2025-01-26 22:11:40 +01:00
Fangrui Song
c1f10ef0a5 [ELF] SHF_LINK_ORDER: replace Fatal with ErrAlways
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change the Fatal to
ErrAlways (not-recoverable) as subsequent code assumes SHF_LINK_ORDER
sh_link is correct.
2025-01-25 18:13:42 -08:00
Fangrui Song
988978f964 [ELF,test] Add env LLD_IN_TEST=1 to make some tests work if RUN_LLD_MAIN_TWICE 2025-01-25 17:51:24 -08:00
Fangrui Song
f21c35d54f [ELF] Replace some Fatal with Err
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
2025-01-25 17:29:28 -08:00
Fangrui Song
b7195e8e04 [ELF,test] Add env LLD_IN_TEST=1 to make some tests work if RUN_LLD_MAIN_TWICE 2025-01-25 16:58:52 -08:00
Fangrui Song
6b87f01aaa [ELF] MergeInputSection: replace Fatal with Err
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
2025-01-25 16:20:27 -08:00
Fangrui Song
7db789b570 [ELF] Replace a few Fatal with Err
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1. Change a few Fatal to
recoverable Err.
2025-01-25 16:00:51 -08:00
Fangrui Song
c7579bfba5 [ELF] -o -: suppress output if disableOutput
So that LLD_IN_TEST=2 ld.lld -o - a.o only writes the output once.
2025-01-25 15:50:29 -08:00
Fangrui Song
4f48048171 [ELF] SHF_MERGE: avoid Fatal
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1.
2025-01-25 15:28:17 -08:00
Fangrui Song
19a6ac18ef [ELF] EhFrame: replace failOn with errOn
These diagnostics are mostly reported by a thread during writeSections.
In LLD_IN_TEST=2 mode, when a thread calls Fatal, there will be no
output even if the process exits with code 1.
2025-01-25 15:18:13 -08:00
Jacek Caban
77c325b646
[LLD][COFF] Keep hasData true in NullChunk constructor (#124368)
`NullChunk` instances do write data, even if it's always zero. Setting
`hasData` to false causes `Writer::assignAddresses` to ignore them
when calculating `rawSize`. This typically isn't an issue, as null chunks
are usually positioned within a section, and later chunks adjust the
size accordingly.

However, on ARM64EC, the auxiliary IAT is placed at the end of the
`.rdata` section and terminates with a null chunk. As a result, `rawSize`
is never updated to account for it, and space for the null chunk is not
allocated. Consequently, when `NullChunk::writeTo` is called, it receives
an invalid pointer - either pointing to the next section or beyond the
allocated buffer.
2025-01-25 22:20:34 +01:00
Nico Weber
d9b8120259
[lld/COFF] Fix -start-lib / -end-lib more after reviews.llvm.org/D116434 (#124294)
This is a follow-up to #120452 in a way.

Since lld/COFF does not yet insert all defined in an obj file before all
undefineds (ELF and MachO do this, see #67445 and things linked from
there), it's possible that:

1. We add an obj file a.obj
2. a.obj contains an undefined that's in b.obj, causing b.obj to be
added
3. b.obj contains an undefined that's in a part of a.obj that's not yet
in the symbol table, causing a recursive load of a.obj, which adds the
symbols in there twice, leading to duplicate symbol errors.

For normal archives, `ArchiveFile::addMember()` has a `seen` check to
prevent this. For start-lib lazy objects, we can just check if the
archive is still lazy at the recursive call.

This bug is similar to issue #59162.

(Eventually, we'll probably want to do what the MachO and ELF ports do.)

Includes a test that caused duplicate symbol diagnostics before this
code change.
2025-01-24 13:14:21 -05:00
alx32
c676104875
[lld-macho] Implement symbol string deduplication (#123874)
The symbol string table does not have deduplication. 
Here we add code to deduplicate the symbol string table. 
This has a rather large size impact (20-30%) on unstripped binaries
(typically debug binaries) but no size impact on stripped
binaries(typically release binaries).

We enable deduplication by default and add a flag to disable it
(`-no-deduplicate-symbol-strings`).
2025-01-23 15:48:11 -08:00