425 Commits

Author SHA1 Message Date
Alexey Karyakin
c0b2c10e9f
[hexagon] Bump the default version to v68 (#132304)
Set the default processor version to v68 when the user does not specify
one in the command line. This includes changes in the LLVM backed and
linker (lld). Since lld normally sets the version based on inputs, this
change will only affect cases when there are no inputs.

Fixes #127558
2025-03-21 20:08:45 -05:00
Jack Styles
4286f4dcce
[AArch64][GCS][LLD] Introduce -zgcs-report-dynamic Command Line Option (#127787)
When GCS was introduced to LLD, the gcs-report option allowed for a user
to gain information relating to if their relocatable objects supported
the feature. For an executable or shared-library to support GCS, all
relocatable objects must declare that they support GCS.

The gcs-report checks were only done on relocatable object files,
however for a program to enable GCS, the executable and all shared
libraries that it loads must enable GCS. gcs-report-dynamic enables
checks to be performed on all shared objects loaded by LLD, and in cases
where GCS is not supported, a warning or error will be emitted.

It should be noted that only shared files directly passed to LLD are
checked for GCS support. Files that are noted in the `DT_NEEDED` tags
are assumed to have had their GCS support checked when they were
created.

The behaviour of the -zgcs-dynamic-report option matches that of GNU ld.
The behaviour is as follows unless the user explicitly sets the value:
* -zgcs-report=warning or -zgcs-report=error implies
-zgcs-report-dynamic=warning.

This approach avoids inheriting an error level if the user wishes to
continue building a module without rebuilding all the shared libraries.
The same approach was taken for the GNU ld linker, so behaviour is
identical across the toolchains.

This implementation matches the error message and command line interface
used within the GNU ld Linker. See here:

724a8341f6

To support this option being introduced, two other changes are included
as part of this PR. The first converts the -zgcs-report option to
utilise an Enum, opposed to StringRef values. This enables easier
tracking of the value the user defines when inheriting the value for the
gas-report-dynamic option. The second is to parse the Dynamic Objects
program headers to locate the GNU Attribute flag that shows GCS is
supported. This is needed so, when using the gcs-report-dynamic option,
LLD can correctly determine if a dynamic object supports GCS.

---------

Co-authored-by: Fangrui Song <i@maskray.me>
2025-03-15 18:15:05 -07:00
Fangrui Song
0a470a9264
[ELF] --package-metadata: support %[0-9a-fA-F][0-9a-fA-F]
(This application-specific option is probably not appropriate as a
linker option (.o file offers more flexibility and decouples JSON
verification from linkers). However, the option has gained some traction
in Linux distributions, with support in GNU ld, gold, and mold.)

GNU ld has supported percent-encoded bytes and extensions like
`%[comma]` since November 2024.  mold supports just percent-encoded
bytes.  To prepare for potential adoption by Ubuntu, let's support
percent-encoded bytes.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=32003
Link: https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/2071468

Pull Request: https://github.com/llvm/llvm-project/pull/126396
2025-02-10 09:21:31 -08:00
Tom Stellard
3bd3e06f3f
Bump version to 21.0.0git (#124870)
Also clear the release notes.
2025-01-28 19:48:43 -08:00
Fangrui Song
ad697b28f4
ReleaseNotes: add lld/ELF notes
Pull Request: https://github.com/llvm/llvm-project/pull/124508
2025-01-27 18:08:18 -08:00
Weining Lu
aa273fd83e [LoongArch] Update lld20 release notes 2025-01-23 12:38:04 +08:00
Brad Smith
52574b5f40
[ELF] Add support for PT_OPENBSD_NOBTCFI (#120005) 2024-12-19 19:41:42 -05:00
Ivan G.
e6ced4da44
Typo fix in large_sections.rst (#120101)
Remove duplicate word.
2024-12-17 14:01:03 +00:00
Peter Collingbourne
64da33a589
ELF: Introduce --randomize-section-padding option.
The --randomize-section-padding option randomly inserts padding between
input sections using the given seed. It is intended to be used in A/B
experiments to determine the average effect of a change on program
performance, while controlling for effects such as false sharing in
the cache which may introduce measurement bias. For more details,
see the RFC:

https://discourse.llvm.org/t/rfc-lld-feature-for-controlling-for-code-size-dependent-measurement-bias/83334

Reviewers: smithp35, MaskRay

Reviewed By: MaskRay, smithp35

Pull Request: https://github.com/llvm/llvm-project/pull/117653
2024-12-13 11:52:09 -08:00
Feng Zou
94c6dd62fa
[docs] Update release notes for APX relocation types (#118575) 2024-12-07 21:27:10 +08:00
Sam Elliott
65ced158e9
[RISCV] Remove R_RISCV_RVC_LUI Relocation (#118714)
This was removed from the ABI in riscv-non-isa/riscv-elf-psabi-doc#398.
It is not emitted by LLVM, and seems to have been an internal
implementation detail in binutils.

This is a follow-up to 26ec5da744b8 which removed previous binutils
internal relocations when they were removed from the ABI.

The LLD implementation was not tested when it was added in
https://reviews.llvm.org/D39322
2024-12-05 10:10:27 +00:00
Jacek Caban
e6cc58922f
[LLD] Add ARM64EC release note (#116282) 2024-11-15 16:19:01 +01:00
Miguel A. Arroyo
5cd6e21bdd
[LLD][COFF] allow saving intermediate files with /lldsavetemps (#115131)
* Parity with the `-save-temps=` flag in the `ELF` `lld` driver.
2024-11-12 22:30:48 +02:00
Miguel A. Arroyo
99cd4cb123
[LLD][MINGW] Add --undefined-glob flag support (#109866) 2024-09-25 11:29:40 +03:00
Miguel A. Arroyo
b9bd8ca24e
[LLD][COFF] Adds /includeglob flag (#109721)
This implements parity with the `--undefined-glob` flag on
[ELF](https://reviews.llvm.org/D63244), but for COFF.
2024-09-24 23:57:01 +03:00
Daniel Thornburgh
7e8a9020b1
[LLD] Add CLASS syntax to SECTIONS (#95323)
This allows the input section matching algorithm to be separated from
output section descriptions. This allows a group of sections to be
assigned to multiple output sections, providing an explicit version of
--enable-non-contiguous-regions's spilling that doesn't require altering
global linker script matching behavior with a flag. It also makes the
linker script language more expressive even if spilling is not intended,
since input section matching can be done in a different order than
sections are placed in an output section.

The implementation reuses the backend mechanism provided by
--enable-non-contiguous-regions, so it has roughly similar semantics and
limitations. In particular, sections cannot be spilled into or out of
INSERT, OVERWRITE_SECTIONS, or /DISCARD/. The former two aren't
intrinsic, so it may be possible to relax those restrictions later.
2024-08-05 13:06:45 -07:00
Fangrui Song
5d972c582a
[ELF] Add -z nosectionheader
GNU ld since 2.41 supports this option, which is mildly useful. It omits
the section header table and non-ALLOC sections (including
.symtab/.strtab (--strip-all)).

This option is simple to implement and might be used by LLDB to test
program headers parsing without the section header table (#100900).

-z sectionheader, which is the default, is also added.

Pull Request: https://github.com/llvm/llvm-project/pull/101286
2024-07-31 12:57:23 -07:00
Tobias Hieta
10c6d6349e
Clear release notes for upcoming LLVM 20 dev cycle 2024-07-23 11:04:06 +02:00
Daniel Bertalan
9a8b0407fc
Reapply "[lld] enable fixup chains by default (#79894)" (#99255)
This reverts commit f55b79f59a77b4be586d649e9ced9f8667265011.

The known issues with chained fixups have been addressed by #98913,
#98305, #97156 and #95171.

Compared to the original commit, support for xrOS (which postdates
chained fixups' introduction) was added and an unnecessary test change
was removed.

----------
Original commit message:

Enable chained fixups in lld when all platform and version criteria are
met. This is an attempt at simplifying the logic used in ld 907:


93d74eafc3/src/ld/Options.cpp (L5458-L5549)

Some changes were made to simplify the logic:
- only enable chained fixups for macOS from 13.0 to avoid the arch check
- only enable chained fixups for iphonesimulator from 16.0 to avoid the
arch check
- don't enable chained fixups for not specifically listed platforms
- don't enable chained fixups for arm64_32
2024-07-22 22:03:32 +02:00
Fangrui Song
0778f5c1f1
[ELF] Support NOCROSSREFS and NOCROSSERFS_TO
Implement the two commands described by
https://sourceware.org/binutils/docs/ld/Miscellaneous-Commands.html

After `outputSections` is available, check each output section described
by at least one `NOCROSSREFS`/`NOCROSSERFS_TO` command. For each checked
output section, scan relocations from its input sections.
This step is slow, therefore utilize `parallelForEach(isd->sections, ...)`.

To support non SHF_ALLOC sections, `InputSectionBase::relocations`
(empty) cannot be used. In addition, we may explore eliminating this
member to speed up relocation scanning.

Some parse code is adapted from #95714.

Close #41825

Pull Request: https://github.com/llvm/llvm-project/pull/98773
2024-07-17 10:45:59 -07:00
Fangrui Song
6464dd21b5
[ELF] OUTPUT_FORMAT: support "binary" and ignore extra OUTPUT_FORMAT commands
This patch improves GNU ld compatibility.

Close #87891: Support `OUTPUT_FORMAT(binary)`, which is like
--oformat=binary. --oformat=binary takes precedence over an ELF
`OUTPUT_FORMAT`.

In addition, if more than one OUTPUT_FORMAT command is specified, only
check the first one.

Pull Request: https://github.com/llvm/llvm-project/pull/98837
2024-07-16 10:28:09 -07:00
Brad Smith
14b9d12039
[docs] Remove the History section (#98715)
This does not really serve any purpose nowadays.
2024-07-13 01:38:39 -04:00
Fangrui Song
8a41327948 ReleaseNotes: add lld/ELF notes 2024-07-06 18:03:39 -07:00
Tatsuyuki Ishi
1d96e4bc2d
[ELF] Change build-id default to sha1 (#93943)
The current default, build-id=fast, is only 8 bytes due to the usage of
64-bit XXH3. This is incompatible with RPM packaging tools which
requires >=16 bytes [1].

In Clang the ENABLE_LINKER_BUILD_ID define makes it pass --build-id
without a specific hash type. When also defaulting to LLD, this provides
a pretty broken default out-of-box.

Using XXH3 was a considerable performance advantage when build-id was
first implemented, because sha1 was really sha1 and rather slow.
Nowadays sha1 is just 160-bit BLAKE3 which is decently fast and not
cryptographically broken, so it should be a good default.

Note that the default remains "fast" for wasm because sha1 for wasm is
still real sha1.

Close https://github.com/llvm/llvm-project/issues/43483.

[1]:
b7d427728b/build/files.c (L1883)
2024-06-10 10:14:44 -07:00
Fangrui Song
4d9020ca0b
[ELF] Implement --force-group-allocation
GNU ld's relocatable linking behaviors:

* Sections with the `SHF_GROUP` flag are handled like sections matched
  by the `--unique=pattern` option. They are processed like orphan
  sections and ignored by input section descriptions.
* Section groups' (usually named `.group`) content is updated as the
  section indexes are updated. Section groups can be discarded with
  `/DISCARD/ : { *(.group) }`.

`-r --force-group-allocation` discards section groups and allows
sections with the `SHF_GROUP` flag to be matched like normal sections.
If two section group members are placed into the same output section,
their relocation sections (if present) are combined as well.
This behavior can be useful when -r output is used as a pseudo shared
object (e.g., FreeBSD's amd64 kernel modules, CHERIoT compartments).

This patch implements --force-group-allocation:

* Input SHT_GROUP sections are discarded.
* Input sections do not get the SHF_GROUP flag, so `addInputSec`
  will combine relocation sections if their relocated section group
  members are combined.

The default behavior is:

* Input SHT_GROUP sections are retained.
* Input SHF_GROUP sections can be matched (unlike GNU ld)
* Input SHF_GROUP sections keep the SHF_GROUP flag, so `addInputSec`
  will create different OutputDesc copies.

GNU ld provides the `FORCE_GROUP_ALLOCATION` command, which is not
implemented.

Pull Request: https://github.com/llvm/llvm-project/pull/94704
2024-06-07 14:19:06 -07:00
Fangrui Song
cfa97699f7 [ELF] Retain uncompressed if compressed content is larger
--compress-debug-sections in GNU ld, gas, and LLVM integrated assembler
retain the uncompressed content if the compressed content is larger.

This patch also updates the manpage (-O2 does not enable zlib level 6)
and fixes a crash of --compress-sections when the uncompressed section
is empty.
2024-05-22 15:55:21 -07:00
Daniel Thornburgh
66466ff151 Reland: [LLD] Implement --enable-non-contiguous-regions (#90007)
When enabled, input sections that would otherwise overflow a memory
region are instead spilled to the next matching output section.

This feature parallels the one in GNU LD, but there are some differences
from its documented behavior:

- /DISCARD/ only matches previously-unmatched sections (i.e., the flag
does not affect it).

- If a section fails to fit at any of its matches, the link fails
instead of discarding the section.

- The flag --enable-non-contiguous-regions-warnings is not implemented,
as it exists to warn about such occurrences.

The implementation places stubs at possible spill locations, and
replaces them with the original input section when effecting spills.
Spilling decisions occur after address assignment. Sections are spilled
in reverse order of assignment, with each spill naively decreasing the
size of the affected memory regions. This continues until the memory
regions are brought back under size. Spilling anything causes another
pass of address assignment, and this continues to fixed point.

Spilling after rather than during assignment allows the algorithm to
consider the size effects of unspillable input sections that appear
later in the assignment. Otherwise, such sections (e.g. thunks) may
force an overflow, even if spilling something earlier could have avoided
it.

A few notable feature interactions occur:

- Stubs affect alignment, ONLY_IF_RO, etc, broadly as if a copy of the
input section were actually placed there.

- SHF_MERGE synthetic sections use the spill list of their first
contained input section (the one that gives the section its name).

- ICF occurs oblivious to spill sections; spill lists for merged-away
sections become inert and are removed after assignment.

- SHF_LINK_ORDER and .ARM.exidx are ordered according to the final
section ordering, after all spilling has completed.

- INSERT BEFORE/AFTER and OVERWRITE_SECTIONS are explicitly disallowed.
2024-05-13 11:06:54 -07:00
Daniel Thornburgh
81f34afa5c
Revert "[LLD] Implement --enable-non-contiguous-regions" (#92005)
Reverts llvm/llvm-project#90007

Broke in merging I think.
2024-05-13 10:38:40 -07:00
Daniel Thornburgh
673114447b
[LLD] Implement --enable-non-contiguous-regions (#90007)
When enabled, input sections that would otherwise overflow a memory
region are instead spilled to the next matching output section.

This feature parallels the one in GNU LD, but there are some differences
from its documented behavior:

- /DISCARD/ only matches previously-unmatched sections (i.e., the flag
does not affect it).

- If a section fails to fit at any of its matches, the link fails
instead of discarding the section.

- The flag --enable-non-contiguous-regions-warnings is not implemented,
as it exists to warn about such occurrences.

The implementation places stubs at possible spill locations, and
replaces them with the original input section when effecting spills.
Spilling decisions occur after address assignment. Sections are spilled
in reverse order of assignment, with each spill naively decreasing the
size of the affected memory regions. This continues until the memory
regions are brought back under size. Spilling anything causes another
pass of address assignment, and this continues to fixed point.

Spilling after rather than during assignment allows the algorithm to
consider the size effects of unspillable input sections that appear
later in the assignment. Otherwise, such sections (e.g. thunks) may
force an overflow, even if spilling something earlier could have avoided
it.

A few notable feature interactions occur:

- Stubs affect alignment, ONLY_IF_RO, etc, broadly as if a copy of the
input section were actually placed there.

- SHF_MERGE synthetic sections use the spill list of their first
contained input section (the one that gives the section its name).

- ICF occurs oblivious to spill sections; spill lists for merged-away
sections become inert and are removed after assignment.

- SHF_LINK_ORDER and .ARM.exidx are ordered according to the final
section ordering, after all spilling has completed.

- INSERT BEFORE/AFTER and OVERWRITE_SECTIONS are explicitly disallowed.
2024-05-13 10:30:50 -07:00
Fangrui Song
6d44a1ef55
[ELF] Adjust --compress-sections to support compression level
zstd excels at scaling from low-ratio-very-fast to
high-ratio-pretty-slow. Some users prioritize speed and prefer disk read
speed, while others focus on achieving the highest compression ratio
possible, similar to traditional high-ratio codecs like LZMA.

Add an optional `level` to `--compress-sections` (#84855) to cater to
these diverse needs. While we initially aimed for a one-size-fits-all
approach, this no longer seems to work.
(https://richg42.blogspot.com/2015/11/the-lossless-decompression-pareto.html)

When --compress-debug-sections is used together, make
--compress-sections take precedence since --compress-sections is usually
more specific.

Remove the level distinction between -O/-O1 and -O2 for
--compress-debug-sections=zlib for a more consistent user experience.

Pull Request: https://github.com/llvm/llvm-project/pull/90567
2024-05-01 11:40:46 -07:00
Fangrui Song
f02a27df2f
[ELF] Add --default-script/-dT
GNU ld added --default-script (alias: -dT) in 2007. The option specifies
a default script that is processed if --script/-T is not specified. -dT
can be used to override GNU ld's internal linker script, but only when
the application does not specify -T.
In addition, dynamorio's CMakeLists.txt may use -dT.

The implementation is simple and the feature can be useful to dabble
with different section layouts.

Pull Request: https://github.com/llvm/llvm-project/pull/89327
2024-04-19 09:09:41 -07:00
cmtice
16711b431b
[lld][ELF] Add --debug-names to create merged .debug_names. (#86508)
`clang -g -gpubnames` (with optional -gsplit-dwarf) creates the
`.debug_names` section ("per-CU" index). By default lld concatenates
input `.debug_names` sections into an output `.debug_names` section.
LLDB can consume the concatenated section but the lookup performance is
not good.

This patch adds --debug-names to create a per-module index by combining
the per-CU indexes into a single index that covers the entire load
module. The produced `.debug_names` is a replacement for `.gdb_index`.

Type units (-fdebug-types-section) are not handled yet.

Co-authored-by: Fangrui Song <i@maskray.me>

---------

Co-authored-by: Fangrui Song <i@maskray.me>
2024-04-18 14:41:14 -07:00
Daniil Kovalev
cca9115b1c
[lld][AArch64][ELF][PAC] Support AUTH relocations and AUTH ELF marking (#72714)
This patch adds lld support for:

- Dynamic R_AARCH64_AUTH_* relocations (without including RELR compressed AUTH
relocations) as described here:
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#auth-variant-dynamic-relocations

- .note.AARCH64-PAUTH-ABI-tag section as defined here
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#elf-marking

Depends on #72713 and #85231

---------

Co-authored-by: Peter Collingbourne <peter@pcc.me.uk>
Co-authored-by: Fangrui Song <i@maskray.me>
2024-04-04 12:38:09 +03:00
Fangrui Song
e115c00565
[ELF] Reject certain unknown section types (#85173)
Unknown section sections may require special linking rules, and
rejecting such sections for older linkers may be desired. For example,
if we introduce a new section type to replace a control structure (e.g.
relocations), it would be nice for older linkers to reject the new
section type. GNU ld allows certain unknown section types:

* [SHT_LOUSER,SHT_HIUSER] and non-SHF_ALLOC
* [SHT_LOOS,SHT_HIOS] and non-SHF_OS_NONCONFORMING

but reports errors and stops linking for others (unless
--no-warn-mismatch is specified). Port its behavior. For convenience, we
additionally allow all [SHT_LOPROC,SHT_HIPROC] types so that we don't
have to hard code all known types for each processor.

Close https://github.com/llvm/llvm-project/issues/84812
2024-03-15 09:50:23 -07:00
Fangrui Song
f1ca2a0967
[ELF] Add --compress-section to compress matched non-SHF_ALLOC sections
--compress-sections <section-glib>=[none|zlib|zstd] is similar to
--compress-debug-sections but applies to broader sections without the
SHF_ALLOC flag. lld will report an error if a SHF_ALLOC section is
matched. An interesting use case is to compress `.strtab`/`.symtab`,
which consume a significant portion of the file size (15.1% for a
release build of Clang).

An older revision is available at https://reviews.llvm.org/D154641 .
This patch focuses on non-allocated sections for safety. Moving
`maybeCompress` as D154641 does not handle STT_SECTION symbols for
`-r --compress-debug-sections=zlib` (see `relocatable-section-symbol.s`
from #66804).

Since different output sections may use different compression
algorithms, we need CompressedData::type to generalize
config->compressDebugSections.

GNU ld feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=27452

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674

Pull Request: https://github.com/llvm/llvm-project/pull/84855
2024-03-12 10:56:14 -07:00
Arthur Eubanks
4bc3b3501f
[lld/ELF] Add documentation on large sections (#82560)
Fixes #82438
2024-02-26 09:11:57 -08:00
SingleAccretion
cb4f94db83
[lld][WebAssembly] Add --no-growable-memory (#82890)
We recently added `--initial-heap` - an option that allows one to up the
initial memory size without the burden of having to know exactly how
much is needed.

However, in the process of implementing support for this in Emscripten
(https://github.com/emscripten-core/emscripten/pull/21071), we have
realized that `--initial-heap` cannot support the use-case of
non-growable memories by itself, since with it we don't know what to set
`--max-memory` to.

We have thus agreed to move the above work forward by introducing
another option to the linker (see
https://github.com/emscripten-core/emscripten/pull/21071#discussion_r1491755616),
one that would allow users to explicitly specify they want a
non-growable memory.

This change does this by introducing `--no-growable-memory`: an option
that is mutally exclusive with `--max-memory` (for simplicity - we can
also decide that it should override or be overridable by `--max-memory`.
In Emscripten a similar mix of options results in `--no-growable-memory`
taking precedence). The option specifies that the maximum memory size
should be set to the initial memory size, effectively disallowing memory
growth.

Closes #81932.
2024-02-25 08:43:11 -08:00
Fangrui Song
78762357d4 [ELF] Support placing .lbss/.lrodata/.ldata after .bss
https://reviews.llvm.org/D150510 places .lrodata before .rodata to
minimize the number of permission transitions in the memory image.
However, this layout is less ideal for -fno-pic code (which is still
important).

Small code model -fno-pic code has R_X86_64_32S relocations with a range
of `[0,2**31)` (if we ignore the negative area). Placing `.lrodata`
earlier exerts relocation pressure on such code. Non-x86 64-bit
architectures generally have a similar `[0,2**31)` limitation if they
don't use PC-relative relocations.

If we place .lrodata later, we will need one extra PT_LOAD. Two layouts
are appealing:

* .bss/.lbss/.lrodata/.ldata (GNU ld)
* .bss/.ldata/.lbss/.lrodata

The GNU ld layout has the nice property that there is only one BSS
(except .tbss/.relro_padding). Add -z lrodata-after-bss to support
this layout.

Since a read-only PT_LOAD segment (for large data sections) may appear
after RW PT_LOAD segments. The placement of `_etext` has to be adjusted.

Pull Request: https://github.com/llvm/llvm-project/pull/81224
2024-02-20 13:59:49 -08:00
Tom Stellard
987087df90 Bump trunk version to 19.0.0git 2024-01-23 19:00:11 -08:00
Sam Clegg
bcc9b9d80c
[lld][WebAssembly] Match the ELF linker in transitioning away from archive indexes. (#78658)
The ELF linker transitioned away from archive indexes in
https://reviews.llvm.org/D117284.

This paves the way for supporting `--start-lib`/`--end-lib` (See #77960)

The ELF linker unified library handling with `--start-lib`/`--end-lib` and removed
the ArchiveFile class in https://reviews.llvm.org/D119074.
2024-01-19 16:20:29 -08:00
SingleAccretion
b2cdf3cc4c
[lld][WebAssembly] Add an --initial-heap option (#75594)
It is beneficial to preallocate a certain number of pages in the linear
memory (i. e. use the "minimum" field of WASM memories) so that fewer
"memory.grow"s are needed at startup.

So far, the way to do that has been to pass the "--initial-memory"
option to the linker. It works, but has the very significant downside of
requiring the user to know the size of static data beforehand, as it
must not exceed the number of bytes passed-in as "--initial-memory".

The new "--initial-heap" option avoids this downside by simply appending
the specified number of pages to static data (and stack), regardless of
how large they already are.

Ref: https://github.com/emscripten-core/emscripten/issues/20888.
2023-12-15 10:16:38 -08:00
Zequan Wu
47b4bbfe52
[LLD][COFF] add __buildid symbol. (#74652)
After #71433, lld-link is able to always generate build id even when PDB
is not generated.

This adds the `__buildid` symbol to points to the start of 16 bytes guid
(which is after `RSDS`) and allows profile runtime to access it and dump
it to raw profile.
2023-12-14 17:43:10 -05:00
spupyrev
b53c04a8da Reapply [ELF] Making cdsort default for function reordering (#68638)
Edited lld/ELF/Options.td to cdsort as well

CDSort function reordering outperforms the existing default heuristic (
hfsort/C^3) in terms of the performance of generated binaries while
being (almost) as fast. Thus, the suggestion is to change the default.
The speedup is up to 1.5% perf for large front-end binaries, and can be
moderate/neutral for "small" benchmarks.

High-level **perf impact** on two selected binaries:
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in
[0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in
[0.8%..1.5%]

More detailed measurements on the clang binary is at
[here](https://reviews.llvm.org/D152834#4445042)
2023-11-03 16:03:06 -07:00
Fangrui Song
a40f651a06
[ELF] adjustOutputSections: don't copy SHF_EXECINSTR when an output does not contain input sections (#70911)
For an output section with no input section, GNU ld eliminates the
output section when there are only symbol assignments (e.g.
`.foo : { symbol = 42; }`) but not for `.foo : { . += 42; }`
(`SHF_ALLOC|SHF_WRITE`).

We choose to retain such an output section with a symbol assignment
(unless unreferenced `PROVIDE`). We copy the previous section flag (see
https://reviews.llvm.org/D37736) to hopefully make the current PT_LOAD
segment extend to the current output section:

* decrease the number of PT_LOAD segments
* If a new PT_LOAD segment is introduced without a page-size
  alignment as a separator, there may be a run-time crash.

However, this `flags` copying behavior is not suitable for
`.foo : { . += 42; }` when `flags` contains `SHF_EXECINSTR`. The
executable bit is surprising

(https://discourse.llvm.org/t/lld-output-section-flag-assignment-behavior/74359).

I think we should drop SHF_EXECINSTR when copying `flags`. The risk is a
code section followed by `.foo : { symbol = 42; }` will be broken, which
I believe is unrelated as such uses are almost always related to data
sections.

For data-command-only output sections (e.g. `.foo : { QUAD(42) }`), we
keep allowing copyable SHF_WRITE.

Some tests are updated to drop the SHF_EXECINSTR flag. GNU ld doesn't
set SHF_EXECINSTR as well, though it sets SHF_WRITE for some tests while
we don't.
2023-11-01 22:35:28 -07:00
Fangrui Song
60b3e05967 [ELF] Restore the --call-graph-profile-sort=hfsort default before #68638
The high time complexity of cache-directed sort is a real issue and is not
appropriate as the default, at least for now
(https://github.com/llvm/llvm-project/pull/68638#issuecomment-1760918891).
2023-10-12 22:58:42 -07:00
spupyrev
d5c1d735ad
[ELF] Making cdsort default for function reordering (#68638)
CDSort function reordering outperforms the existing default heuristic (
hfsort/C^3) in terms of the performance of generated binaries while
being (almost) as fast. Thus, the suggestion is to change the default.
The speedup is up to 1.5% perf for large front-end binaries, and can be
moderate/neutral for "small" benchmarks.

High-level **perf impact** on two selected binaries:
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in
[0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in
[0.8%..1.5%]

More detailed measurements on the clang binary is at
[here](https://reviews.llvm.org/D152834#4445042)
2023-10-10 09:06:31 -07:00
Alexandre Ganea
356139bd02
[LLD][COFF] Add support for --time-trace (#68236)
This adds support for generating Chrome-tracing .json profile traces in
the LLD COFF driver.

Also add the necessary time scopes, so that the profile trace shows in
great detail which tasks are executed.

As an example, this is what we see when linking a Unreal Engine
executable:

![image](https://github.com/llvm/llvm-project/assets/37383324/b2e26eb4-9d37-4cf9-b002-48b604e7dcb7)
2023-10-05 22:33:58 -04:00
spupyrev
904b3f66f5 [ELF] A new code layout algorithm for function reordering [3a/3]
We are brining a new algorithm for function layout (reordering) based on the
call graph (extracted from a profile data). The algorithm is an improvement of
top of a known heuristic, C^3. It tries to co-locate hot and frequently executed
together functions in the resulting ordering. Unlike C^3, it explores a larger
search space and have an objective closely tied to the performance of
instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort.
The algorithm can be used at the linking or post-linking (e.g., BOLT) stage.
Refer to https://reviews.llvm.org/D152834 for the actual implementation of the
reordering algorithm.

This diff adds a linker option to replace the existing C^3 heuristic with CDS.
The new behavior can be turned on by passing "--use-cache-directed-sort".
(the plan is to make it default in a next diff)

**Perf-impact**
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in [0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in [0.8%..1.5%]

Note that function layout affects the perf the most on older machines (with
smaller instruction/iTLB caches) and when huge pages are not enabled. The impact
on newer processors with huge pages enabled is likely neutral/minor.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D152840
2023-09-26 06:24:34 -07:00
Fangrui Song
8c556b7e2b [ELF] Change --call-graph-profile-sort to accept an argument
Change the FF form --call-graph-profile-sort to --call-graph-profile-sort={none,hfsort}.
This will be extended to support llvm/lib/Transforms/Utils/CodeLayout.cpp.

--call-graph-profile-sort is not used in the wild but
--no-call-graph-profile-sort is (Chromium). Make --no-call-graph-profile-sort an
alias for --call-graph-profile-sort=none.

Reviewed By: rahmanl

Differential Revision: https://reviews.llvm.org/D159544
2023-09-25 09:49:40 -07:00
Aaron Ballman
cf3b12c795 Fix lld Sphinx build
This addresses issues found by:
https://lab.llvm.org/buildbot/#/builders/69/builds/41718
2023-09-15 15:15:52 -04:00