1163 Commits

Author SHA1 Message Date
alx32
4ac79a8c98
[lld-macho] Use Symbols as branch target for safe_thunks ICF (#126835)
## Problem

The `safe_thunks` ICF optimization in `lld-macho` was creating thunks
that pointed to `InputSection`s instead of `Symbol`s. While, generally,
branch relocations can point to symbols or input sections, in this case
we need them to point to symbols as subsequently the branch extension
algorithm expects branches to always point to `Symbol`'s.

## Solution
This patch changes the ICF implementation so that safe thunks point to
`Symbol`'s rather than `InputSection`s.

## Testing
The existing `arm64-thunks.s` test is modified to include
`--icf=safe_thunks` to explicitly verify the interaction between ICF and
branch range extension thunks. Two functions were added that will be
merged together via a thunk. Before this patch, this test would generate
an assert - now this scenario is correctly handled.
2025-02-13 11:07:12 -08:00
Ellis Hoag
79fff6aa32
[lld][BP] Avoid ordering ICF'ed sections (#126327)
ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems
that the original sections are marked as not live, but are still kept
around. Prior to this patch, those ICF'ed sections would be passed to BP
and ordered before being skipped when writing the output. Now, these
sections are no longer passed to BP, saving runtime and possibly
improving BP's output.

In a large binary, I found that the number of sections ordered using BP
decreased, while the number of duplicate sections drastically decreased
as expected.
```
Functions for startup: 50755 -> 50520
Functions for compression: 165734 -> 105328
Duplicate functions: 1827231 -> 55230
```
2025-02-13 08:57:44 -08:00
Kazu Hirata
8686e677ff
[lld] Migrate away from PointerUnion::dyn_cast (NFC) (#125458)
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect referent to be nonnull.
2025-02-03 12:28:14 -08:00
Fangrui Song
2f6e3df08a BPSectionOrderer: stabilize iteration order and node order
Exposed by the test added in the reverted #120514

* Fix libstdc++/libc++ differences due to nth_element. https://github.com/llvm/llvm-project/pull/125450#issuecomment-2631404178
* Fix LLVM_ENABLE_REVERSE_ITERATION=1 differences
* Fix potential issue in `currentSize += D::getSize(*sections[*sectionIdxs.begin()])` where DenseSet was used, though not covered by a test
2025-02-03 10:36:51 -08:00
Hans Wennborg
f3c4b58f4b Revert "[ELF] Add BPSectionOrderer options (#120514)"
The ELF/bp-section-orderer.s test is failing on some buildbots due to
what seems like non-determinism issues, see comments on the original PR
and #125450

Reverting to green the build.

This reverts commit 0154dce8d39d2688b09f4e073fe601099a399365 and
follow-up commits 046dd4b28b9c1a75a96cf63465021ffa9fe1a979 and
c92f20416e6dbbde9790067b80e75ef1ef5d0fa4.
2025-02-03 11:41:23 +01:00
Fangrui Song
046dd4b28b [lld] BPSectionOrderer: stabilize iteration order 2025-02-02 21:58:29 -08:00
Fangrui Song
115bb87ad0 [lld] BPSectionOrderer: replace Symbol with Defined and optimize getSymbols. NFC 2025-02-02 15:43:01 -08:00
Fangrui Song
e0c7f081f1
[lld-macho] Refactor BPSectionOrderer with CRTP. NFC
PR #117514 refactored BPSectionOrderer to be used by the ELF port
but introduced some inefficiency:

* BPSectionBase/BPSymbol are wrappers around a single pointer.
  The numbers of sections and symbols could be huge, and the extra
  allocations are memory inefficient.
* Reconstructing the returned DenseMap (since BPSectionBase != InputSectin)
  is wasteful.

This patch refactors BPSectionOrderer with Curiously Recurring Template
Pattern and eliminates the inefficiency. In addition,
`symbolToSectionIdxs` is removed and `rootSymbolToSectionIdxs` building
is moved to lld/MachO: while getting sections for symbols is cheap in
Mach-O, it is awkward and inefficient in the ELF port.

While here, add a file-level comment and replace some `StringMap<*>`
(which copies strings) with `DenseMap<CachedHashStringRef, *>`.

Pull Request: https://github.com/llvm/llvm-project/pull/124482
2025-01-27 18:24:59 -08:00
Kazu Hirata
5d24341667
[lld] Migrate away from PointerUnion::dyn_cast (NFC) (#124504)
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

This patch migrates uses of PointerUnion::dyn_cast to
dyn_cast_if_present (see the definition of PointerUnion::dyn_cast).
Note that we cannot use dyn_cast in any of the migrations in this
patch; placing

  assert(!X.isNull());

just before any of dyn_cast_if_present in this patch triggers some
failure in check-lld.
2025-01-27 10:34:54 -08:00
Fangrui Song
e8e75e08c9 [lld-macho] Remove unneeded functions from BPSectionOrderer. NFC 2025-01-26 09:46:38 -08:00
alx32
c676104875
[lld-macho] Implement symbol string deduplication (#123874)
The symbol string table does not have deduplication. 
Here we add code to deduplicate the symbol string table. 
This has a rather large size impact (20-30%) on unstripped binaries
(typically debug binaries) but no size impact on stripped
binaries(typically release binaries).

We enable deduplication by default and add a flag to disable it
(`-no-deduplicate-symbol-strings`).
2025-01-23 15:48:11 -08:00
Kazu Hirata
aaf0643dd5
[lld] Migrate away from PointerUnion::dyn_cast (NFC) (#123891)
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses cast
because we know expect isa<Symbol *>(rel.referent) to be true.
2025-01-22 00:17:04 -08:00
Kazu Hirata
a0ec385873
[lld] Migrate away from PointerUnion::dyn_cast (NFC) (#123721)
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses cast
because we know expect isa<InputSection *>(reloc.referent) to be true.
2025-01-21 11:57:53 -08:00
Ellis Hoag
8b0c774f8a
[lld][InstrProf][NFC] Fix typo in help message (#123390) 2025-01-17 13:04:28 -08:00
alx32
1c3c65590d
[lld-macho] Document '-icf' flag options (#123372)
Adding the `safe_thunks` option in `Options.td` as it was missing there
- mentioned by @Colibrow in
https://github.com/llvm/llvm-project/pull/106573
Also documenting what the various options mean. 

Help now looks like this:
```
..........
  --error-limit=<value>   Maximum number of errors to print before exiting (default: 20)
  --help-hidden           Display help for hidden options
  --icf=[none,safe,safe_thunks,all]
                          Set level for identical code folding (default: none). Possible values:
                            none        - Disable ICF
                            safe        - Only folds non-address significant functions (as described by `__addrsig` section)
                            safe_thunks - Like safe, but replaces address-significant functions with thunks
                            all         - Fold all identical functions
  --ignore-auto-link-option=<value>
                          Ignore a single auto-linked library or framework. Useful to ignore invalid options that ld64 ignores
  --irpgo-profile-sort=<profile>
                          Deprecated. Please use --irpgo-profile and --bp-startup-sort=function
..........
```
2025-01-17 10:48:32 -08:00
Fangrui Song
60e4d24963
[lld-macho,BalancedPartition] Simplify relocation hash and avoid xxHash
xxHash, inferior to xxh3, is discouraged. We try not to use xxhash in
lld.

Switch to read32le for content hash and xxh3/stable_hash_combine for
relocation hash. Remove the intermediate std::string for relocation
hash.

Change the tail hashing scheme to consider individual bytes instead.
This helps group 0102 and 0201 together. The benefit is negligible,
though.

Pull Request: https://github.com/llvm/llvm-project/pull/121729
2025-01-16 09:31:42 -08:00
alx32
95d21f6015
[lld-macho] Reduce memory usage of printing thunks in map file (#122785)
This commit improves the memory efficiency of the lld-macho linker by
optimizing how thunks are printed in the map file. Previously, merging
vectors of input sections required creating a temporary vector, which
increased memory usage and in some cases caused the linker to run out of
memory as reported in comments on
https://github.com/llvm/llvm-project/pull/120496. The new approach
interleaves the printing of two arrays of ConcatInputSection in sorted
order without allocating additional memory for a merged array.
2025-01-15 22:58:24 -08:00
Fangrui Song
cc88a5e615
[lld-macho,NFC] Switch to increasing priorities
--order_file, call graph profile, and BalancedPartitioning currently
build the section order vector by decreasing priority (from SIZE_MAX to
0). However, it's conventional to use an increasing key (see
OutputSection::inputOrder).

Switch to increasing priorities, remove the global variable
highestAvailablePriority, and remove the highestAvailablePriority
parameter from BPSectionOrderer. Change size_t to int.

This improves consistenty with the ELF and COFF ports. The ELF port
utilizes negative priorities for --symbol-ordering-file and call graph
profile, and non-negative priorities for --shuffle-sections (no Mach-O
counterpart yet).

Pull Request: https://github.com/llvm/llvm-project/pull/121727
2025-01-10 09:32:03 -08:00
alx32
156e605163
[lld-macho] Fix branch extension thunk estimation logic (#120529)
This patch improves the linker’s ability to estimate stub reachability
in the `TextOutputSection::estimateStubsInRangeVA` function. It does so
by including thunks that have already been placed ahead of the current
call site address when calculating the threshold for direct stub calls.

Before this fix, the estimation process overlooked existing forward
thunks. This could result in some thunks not being inserted where
needed. In rare situations, particularly with large and specially
arranged codebases, this might lead to branch instructions being out of
range, causing linking errors.

Although this patch successfully addresses the problem, it is not
feasible to create a test for this issue. The specific layout and order
of thunk creation required to reproduce the corner case are too complex,
making test creation impractical.

Example error messages the issue could generate:
```
ld64.lld: error: banana.o:(symbol OUTLINED_FUNCTION_24949_3875): relocation BRANCH26 is out of range: 134547892 is not in [-134217728, 134217727]; references objc_autoreleaseReturnValue
ld64.lld: error: main.o:(symbol _main+0xc): relocation BRANCH26 is out of range: 134544132 is not in [-134217728, 134217727]; references objc_release
```
2025-01-09 14:14:13 -08:00
alx32
162814a7ec
[lld-macho] Include branch extension thunks in linker map (#120496)
This patch extends the MachO linker's map file generation to include
branch extension thunk symbols. Previously, thunks were omitted from the
map file, making it difficult to understand the final layout of the
binary, especially when debugging issues related to long branch thunks.
This change ensures thunks are included and correctly interleaved with
other symbols based on their address, providing an accurate
representation of the linked output.
2025-01-07 21:07:51 -08:00
Fangrui Song
c2f7745b4e [lld-macho] Remove redundant hasValidData. NFC
lld::macho::runBalancedPartitioning ensures that all sections satisfy
`hasValidData`.
2025-01-05 15:59:17 -08:00
Anutosh Bhat
ba93eccded
[lld][MachO] Fix warning while building for wasm (#120889)
While building clang & lld against emscripten for wasm, I see the
following
```
 │ │ /home/runner/work/recipes/recipes/output/bld/rattler-build_llvm_1734801187/work/lld/MachO/SyntheticSections.cpp:2075:25: warning: comparison of integers of
 │ │  different signs: 'long' and 'const uint32_t' (aka 'const unsigned int') [-Wsign-compare]
 │ │  2075 |   assert(buf - bufStart == sectionSize &&
 │ │       |          ~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~
 │ │ $BUILD_PREFIX/opt/emsdk/upstream/emscripten/cache/sysroot/include/assert.h:8:28: note: expanded from macro 'assert'
 │ │     8 | #define assert(x) ((void)((x) || (__assert_fail(#x, __FILE__, __LINE__, __func__),0)))
 │ │       |                            ^
 ```
 
 Casting `sectionSize` should be enough I think
2025-01-05 17:49:50 +08:00
Ellis Hoag
40e734e041
[lld][MachO] Allow separate --irpgo-profile flag (#121354) 2024-12-31 09:07:00 -08:00
Carlo Cabrera
a0f0a69b62
[lld][MachO] Fix symbol insertion in transplantSymbolsAtOffset (#120737)
The existing comparison does not insert symbols in the intended place.

Closes #120559.

---------

Co-authored-by: Bjorn Pettersson <bjorn.a.pettersson@ericsson.com>
2024-12-22 21:50:15 +08:00
Max
79e859e049
[lld] Move BPSectionOrderer from MachO to Common for reuse in ELF (#117514)
Add lld/Common/BPSectionOrdererBase from MachO for reuse in ELF
2024-12-18 09:24:25 -08:00
Kazu Hirata
e04fde193b
[lld] Migrate away from PointerUnion::{is,get} (NFC) (#119993)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
2024-12-14 20:07:08 -08:00
Chandler Carruth
dd647e3e60
Rework the Option library to reduce dynamic relocations (#119198)
Apologies for the large change, I looked for ways to break this up and
all of the ones I saw added real complexity. This change focuses on the
option's prefixed names and the array of prefixes. These are present in
every option and the dominant source of dynamic relocations for PIE or
PIC users of LLVM and Clang tooling. In some cases, 100s or 1000s of
them for the Clang driver which has a huge number of options.

This PR addresses this by building a string table and a prefixes table
that can be referenced with indices rather than pointers that require
dynamic relocations. This removes almost 7k dynmaic relocations from the
`clang` binary, roughly 8% of the remaining dynmaic relocations outside
of vtables. For busy-boxing use cases where many different option tables
are linked into the same binary, the savings add up a bit more.

The string table is a straightforward mechanism, but the prefixes
required some subtlety. They are encoded in a Pascal-string fashion with
a size followed by a sequence of offsets. This works relatively well for
the small realistic prefixes arrays in use.

Lots of code has to change in order to land this though: both all the
option library code has to be updated to use the string table and
prefixes table, and all the users of the options library have to be
updated to correctly instantiate the objects.

Some follow-up patches in the works to provide an abstraction for this
style of code, and to start using the same technique for some of the
other strings here now that the infrastructure is in place.
2024-12-11 15:44:44 -08:00
Max
a2959071be
[lld][MachO] Rename to bp-* options for SectionOrderer (#118594)
Rename options related to profile guided function order (#96268) to
prepare for the addition to the ELF port.
2024-12-09 22:50:21 -08:00
Carlo Cabrera
d668304998
[lld][MachO] Support -allowable_client (#117155)
Closes #117113.

Follow-up to #114638.
2024-11-27 11:23:49 -05:00
Tom Lin
b4e000e600
[LLD][MachO] Enable plugin support for LTO (#115690)
Add new CLI options for feature parity with ELF w.r.t pass plugins.
Most of the changes are ported directly from
0c86198b27.
With this change, it is now possible to load and run external pass
plugins during the LTO phase.
2024-11-22 15:01:59 -08:00
Carlo Cabrera
1de9bc1a27
[lld][MachO] Respect dylibs linked with -allowable_client (#114638)
ld64.lld would previously allow you to link against dylibs linked with
`-allowable_client`, even if the client's name does not match any
allowed client.

This change fixes that. See #114146 for related discussion. 

The test binary `liballowable_client.dylib` was created on macOS with:

echo | clang -xc - -dynamiclib -mmacosx-version-min=10.11 -arch x86_64
-Wl,-allowable_client,allowed -o lib/liballowable_client.dylib
2024-11-20 20:02:17 -05:00
alx32
7404685598
[lld-macho] Fix compatibility between --icf=safe_thunks and --keep-icf-stabs (#116687)
Currently when `--icf=safe_thunks` is used, `STABS` entries cannot be
generated for ICF'ed functions. This is because if ICF converts a full
function into a thunk and then we generate a `STABS` entry for the
thunk, `dsymutil` will expect to find the entire function body at the
location of the thunk. Because just a thunk will be present at the
location of the `STABS` entry - dsymutil will generate invalid debug
info for such scenarios.

With this change, if `--icf=safe_thunks` is used and `--keep-icf-stabs`
is also specified, STABS entries will be created for all functions, even
merged ones. However, the STABS entries will point at the actual (full)
function body while having the name of the thunk. This way we still get
program correctness as well as correct DWARF data. When doing this, the
debug data will be identical to the scenario where we're using
`--icf=all` and `--keep-icf-stabs`, but the actual program will also
contain thunks, which won't show up in the DWARF data.
2024-11-20 09:36:52 -08:00
Fangrui Song
fcb6b132fa [lld] Use context-aware outs() and errs()
For COFF and ELF that are mostly free of global states, lld::errs() and
lld::outs() should not be used. This migration change allows us to
remove lld::errs, which uses the global errorHandler().
2024-11-16 21:37:34 -08:00
Kyungwoo Lee
ab27253ad3
[CGData][lld-macho] Merge CG Data by LLD (#112674)
LLD now processes raw CG data for stable functions, similar to how it
handles raw CG data for the outliner's hash tree. This data is encoded
in the custom section (`__llvm_merge`) within object files. LLD merges
this information into the indexed CG data file specified by the
`-codegen-data-generate-path={path}` option. For the linker that does
not support this feature, we could use `llvm-cgdata` tool --
https://github.com/llvm/llvm-project/blob/main/llvm/docs/CommandGuide/llvm-cgdata.rst.

Depends on #115750.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-11-15 17:24:35 -08:00
SharonXSharon
6827a00d4d
[lld][InstrProf] Do not use cstring offset hashes in function order for compression (#113606) 2024-10-28 09:47:21 -07:00
alx32
f9d3e98207
[lld-macho] Improve robustness of ObjC category merging (#112618)
This patch enhances the robustness of lld's Objective-C category
merging. Currently, the category merger assumes it can fully parse and
understand the format of all categories in the input, triggering an
assert if any invalid category data is encountered.

This will end up causing asserts in certain rare corner cases that are
difficult to reproduce in small test cases. The proposed changes modify
the behavior so that if invalid category data is detected, category
merging is skipped for that specific class and all other categories
sharing the same base class. This approach allows the linker to continue
processing other categories without failing entirely due to a single
problematic input.

We also add a LIT test to where we corrupt category data and check that
category merging for that class was skipped but the link was successful.
2024-10-18 11:03:16 -07:00
alx32
97a4324224
[lld-macho] Fix ICF differentiation of safe_thunks relocs (#111811)
In `--icf=safe_thunks` mode, the linker differentiates `keepUnique`
functions by creating thunks during a post-processing step after
Identical Code Folding (ICF). While this ensures that `keepUnique`
functions themselves are not incorrectly merged, it overlooks functions
that reference these `keepUnique` symbols.

If two functions are identical except for references to different
`keepUnique` functions, the current ICF algorithm incorrectly considers
them identical because it doesn't account for the future differentiation
introduced by thunks. This leads to incorrect deduplication of functions
that should remain distinct.

To address this issue, we modify the ICF comparison to explicitly check
for references to `keepUnique` functions during deduplication. By doing
so, functions that reference different `keepUnique` symbols are
correctly identified as distinct, preventing erroneous merging and
ensuring the correctness of the linked output.
2024-10-10 08:22:48 -07:00
Nuri Amari
2edd897a42
Make WriteIndexesThinBackend multi threaded (#109847)
We've noticed that for large builds executing thin-link can take on the
order of 10s of minutes. We are only using a single thread to write the
sharded indices and import files for each input bitcode file. While we
need to ensure the index file produced lists modules in a deterministic
order, that doesn't prevent us from executing the rest of the work in
parallel.

In this change we use a thread pool to execute as much of the backend's
work as possible in parallel. In local testing on a machine with 80
cores, this change makes a thin-link for ~100,000 input files run in ~2
minutes. Without this change it takes upwards of 10 minutes.

---------

Co-authored-by: Nuri Amari <nuriamari@fb.com>
2024-10-07 08:16:46 -07:00
alx32
9e862ae321
[lld-macho] Fix invalid DWARF with --icf=safe_thunks (#111097)
There is a bug in the current implementation of `--icf=safe_thunks`
where a STABS entry is emitted for generated thunks. This is problematic
as we end up generating invalid DWARF as dsymutil will think the entire
function body is at the thunk location, when in actuality there will
only be a single branch present. This will end up causing overlapping
DWARF entries.

To fix this we never generate STABS entries for such thunks. 
The existing `--icf=safe_thunks` test is updated to also generate debug
info and we add a check that no corrupt DWARF is generated.

As a future TODO we need to make `--keep-icf-stabs` compatible with
`--icf=safe_thunks`.
2024-10-04 21:48:09 -07:00
Kazu Hirata
9ed46fbe9f
[lld] Use StringRef idioms (NFC) (#109584) 2024-09-22 20:45:25 -07:00
Kyungwoo Lee
f4763b3d24
Reland [CGData] LLD for MachO #90166 (#108733)
It reads raw CG data encoded in the custom section (__llvm_outline) in
object files and merges them into the indexed codegen data file
specified by -codegen-data-generate-path={path}.

This depends on https://github.com/llvm/llvm-project/pull/90074.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-09-15 06:39:06 -07:00
JOE1994
4b27b5800f [lld] Nits on uses of raw_string_ostream (NFC)
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Strip calls to raw_string_ostream::str(), to avoid excess layer of indirection.
2024-09-15 04:23:11 -04:00
Kyungwoo Lee
9de260364b Revert "[CGData] LLD for MachO (#90166)"
This reverts commit 00c0b1ae20358a9e55ff8eda20c4e0546ee81b5b.
2024-09-14 21:15:53 -07:00
Kyungwoo Lee
00c0b1ae20
[CGData] LLD for MachO (#90166)
It reads raw CG data encoded in the custom section (__llvm_outline) in
object files and merges them into the indexed codegen data file
specified by `-codegen-data-generate-path={path}`.

This depends on https://github.com/llvm/llvm-project/pull/90074.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-09-14 20:38:14 -07:00
Ellis Hoag
ce91e2153f
[lld][InstrProf] Sort startup functions for compression (#107348) 2024-09-06 09:22:03 -07:00
Daniel Bertalan
691e3c64d0
[lld-macho] Fix Defined size increase with -mms-bitfields (#107545)
Under the Microsoft ABI, only those bit fields can be merged whose
underlying types have the same size.

d175616 (`[lld-macho][arm64] Enhance safe ICF with thunk-based
deduplication`) added an enum field (`identicalCodeFoldingKind`) next to
booleans in the `Defined` class, which increased the size under the MS
ABI. On MinGW targets, this triggered the `static_assert` which checks
the size of `Defined` (for MSVC targets, the check is disabled due to
another problem). Let's store it as a `uint8_t` to allow merging to take
place.

Fixes #107511
2024-09-06 10:58:19 +02:00
alx32
d1756165a9
[lld-macho][arm64] Enhance safe ICF with thunk-based deduplication (#106573)
Currently, our `safe` ICF mode only merges non-address-significant code,
leaving duplicate address-significant functions in the output. This
patch introduces `safe_thunks` ICF mode, which keeps a single master
copy of each function and replaces address-significant duplicates with
thunks that branch to the master copy.
Currently `--icf=safe_thunks` is only supported for `arm64`
architectures.

**Perf stats for a large binary:**
| ICF Option | Total Size | __text Size | __unwind_info | % total |

|-------------------|------------|-------------|---------------------|---------------------------|
| `--icf=none` | 91.738 MB | 55.220 MB | 1.424 MB | 0% |
| `--icf=safe` | 85.042 MB | 49.572 MB | 1.168 MB | 7.30% |
| `--icf=safe_thunks` | 84.650 MB | 49.219 MB | 1.143 MB | 7.72% |
| `--icf=all` | 82.060 MB | 48.726 MB | 1.111 MB | 10.55% |

So overall we can expect a `~0.45%` binary size reduction for a typical
large binary compared to the `--icf=safe` option.

**Runtime:**
Linking the above binary took ~10 seconds. Comparing the link
performance of --icf=safe_thunks vs --icf=safe, a ~2% slowdown was
observed.
2024-09-05 16:36:21 -07:00
Ellis Hoag
3380dae2f0
[lld][InstrProf] Refactor BPSectionOrderer.cpp (#107347)
Refactor some code in `BPSectionOrderer.cpp` in preparation for
https://github.com/llvm/llvm-project/pull/107348.

* Rename `constructNodesForCompression()` -> `getUnsForCompression()`
and return a `SmallVector` directly rather than populating a vector
alias
* Pass `duplicateSectionIdxs` as a pointer to make it possible to skip
finding (nearly) duplicate sections
* Combine `duplicate{Function,Data}SectionIdxs` into one variable
* Compute all `BPFunctionNode` vectors at the end (like
`nodesForStartup`)

There should be no functional change.
2024-09-05 14:55:05 -07:00
Nico Weber
62e6c1ead7
[lld/mac] Allow -segprot having stricter initprot than maxprot on mac (#107269)
...including for catalyst.

The usecase for this is to put certain security-critical variables into
a special segment/section that's mapped as read-only most of the time,
and that temporary gets remapped as writeable when these variables are
written to be the program. This protects against them being written to
by heap spraying attacks. This special section should be mapped as
read-only at program start, so using

`-segprot MY_PROTECTED_MEMORY_THINGER rw r`

to mark that segment as rw maxprot and r initprot is exactly what we
want.

lld has so far rejected mismatching initprot and maxprot.

ld64 doesn't reject this, but silently writes initprot into both fields
(!) It looks like this might not be fully intentional, see
https://crbug.com/41495919#comment5 and
http://crbug.com/41495919#comment8.

In any case, when postprocessing ld64's output to have different values
for initprot and maxprot, the dynamic loader seems to do the right thing
(see also the previous two links).

The same technique also works on Windows, using both link.exe and
lld-link.exe using `/SECTION:myprotsect,R`.

So, since this is useful, allow it when targeting macOS, and make it do
what you'd expect.

Since loader support for this on iOS is less clear, keep disallowing it
there for now.

See the PR for the program I used to check that this seems to work. (I
only checked on arm64 macOS 14.5 so far; will run this on many more
systems on bots once this is merged and rolled in.)
2024-09-05 12:29:46 -04:00
Daniel Bertalan
b24a304435
[lld-macho] Always store symbol name length eagerly (NFC) (#106906)
The only instance where we weren't already passing a `StringRef` with a
known length to `Symbol`'s constructor is where the argument is a string
literal. Even in that case, lazy `strlen` calls don't make sense, as the
compiler can constant-evaluate the `StringRef(const char*)` constructor.

For symbols that go into the symbol table we need the length when
calculating the hash anyway. We could get away with not calling
`getName()` for local symbols, but the total contribution of `strlen` to
the run time is already below 1%, so that would just complicate the code
for a negligible benefit.
2024-09-04 01:05:15 +02:00