When the script has executed `cd %t`, it is fine to to use the output
file `a.out`.
(We don't want to rely on lit's default PWD to support lit compatible
runners. Therefore -o /dev/null is used when PWD has not been changed
to a %t derived path.)
The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.
Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.
This can happen when using a LTO build of compiler-rt for ARM and the
program uses 64-bit division.
The 64-bit division function in compiler-rt (__aeabi_ldivmod) is written
in assembly and calls the C function __divmoddi4, which works fine in
non-LTO links. However, when building with LTO the call inside
__aeabi_ldivmod is replaced with a jump to address zero, which then
crashes the program.
Building with -pie generates an error instead of a jump to address zero,
and surprisingly just declaring the __aeabi_ldivmod function (but not
calling it) in the input IR also avoids this issue.
Reported as https://github.com/llvm/llvm-project/issues/127284
Co-authored-by: Fangrui Song <i@maskray.me>
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/127286
These relocations apply to 16-bit Thumb instructions, so reading 16 bits
rather than 32 bits ensures the correct bits are masked and written
back. This fixes the incorrect masking and aligns the relocation logic
with the instruction encoding.
Before this patch, 32 bits were read from the ELF object. This did not
align with the instruction size of 16 bits, but the masking incidentally
made it all work nonetheless. However, this was the case only in little
endian.
In big endian mode, the read 32-bit word had to have its bytes reversed.
With this byte reordering, the masking would be applied to the wrong
bits, hence causing the incorrect encoding to be produced as a result of
the relocation resolution.
The added test checks the result for both little and big endian modes.
This patch allows `lld-headers` and `lld-libraries` in
`LLVM_DISTRIBUTION_COMPONENTS` to be specified and thus enable piecewise
installation of `lld/**/*.h` headers and/or lld libraries (both in
shared and static builds).
This is similar to use cases such as
`clang;clang-headers;clang-libraries`. Note when `lld-libraries` is
present, `llvm-libraries` must be present as well because various lld
libraries depend on various llvm libraries.
--export-dynamic should be a no-op when ctx.hasDynsym is false.
* Drop unneeded ctx.hasDynsym checks.
* Static linking with --export-dynamic does not prevent devirtualization.
The Global Function Merger
(https://discourse.llvm.org/t/rfc-global-function-merging/82608) pass
optimistically creates merged instances of functions and suffixes their
names with `.Tgm`. Then in the linker, ICF will (hopefully) fold these
`.Tgm` functions. For example, a function `foo` might become a thunk
`foo` that calls a merged function `foo.Tgm`.
Since IRPGO runs before the global merger, we will only have a profile
for `foo`. We want to correlate this profile to both `foo` and `foo.Tgm`
so they can both be ordered to improve startup time.
I built a large binary and found that it increased the number of
functions ordered for startup, as expected.
```
Functions for startup: 12049 -> 12697
Functions for compression: 34733 -> 34707
```
The reason why we don't see a larger improvement is because there are
some cases where the code was accidentally working:
`getRootSymbol("foo.llvm.5555.Tgm")` already returns `foo`.
## Problem
The `safe_thunks` ICF optimization in `lld-macho` was creating thunks
that pointed to `InputSection`s instead of `Symbol`s. While, generally,
branch relocations can point to symbols or input sections, in this case
we need them to point to symbols as subsequently the branch extension
algorithm expects branches to always point to `Symbol`'s.
## Solution
This patch changes the ICF implementation so that safe thunks point to
`Symbol`'s rather than `InputSection`s.
## Testing
The existing `arm64-thunks.s` test is modified to include
`--icf=safe_thunks` to explicitly verify the interaction between ICF and
branch range extension thunks. Two functions were added that will be
merged together via a thunk. Before this patch, this test would generate
an assert - now this scenario is correctly handled.
ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems
that the original sections are marked as not live, but are still kept
around. Prior to this patch, those ICF'ed sections would be passed to BP
and ordered before being skipped when writing the output. Now, these
sections are no longer passed to BP, saving runtime and possibly
improving BP's output.
In a large binary, I found that the number of sections ordered using BP
decreased, while the number of duplicate sections drastically decreased
as expected.
```
Functions for startup: 50755 -> 50520
Functions for compression: 165734 -> 105328
Duplicate functions: 1827231 -> 55230
```
`%T` is not unique and deprecated
[[1](https://llvm.org/docs/CommandGuide/lit.html#substitutions)].
This patch replaces all `%T` in `lld/test` with `%t.dir` (`mkdir` if
necessary)
---------
Signed-off-by: Peter Rong <PeterRong@meta.com>
(This application-specific option is probably not appropriate as a
linker option (.o file offers more flexibility and decouples JSON
verification from linkers). However, the option has gained some traction
in Linux distributions, with support in GNU ld, gold, and mold.)
GNU ld has supported percent-encoded bytes and extensions like
`%[comma]` since November 2024. mold supports just percent-encoded
bytes. To prepare for potential adoption by Ubuntu, let's support
percent-encoded bytes.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=32003
Link: https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/2071468
Pull Request: https://github.com/llvm/llvm-project/pull/126396
Following the conclusion of the
[RFC](https://discourse.llvm.org/t/rfc-names-for-flang-rt-libraries/84321),
rename Flang's runtime libraries as follows:
* libFortranRuntime.(a|so) to libflang_rt.runtime.(a|so)
* libFortranFloat128Math.a to libflang_rt.quadmath.a
* libCufRuntime_cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so) to
libflang_rt.cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so)
This follows the same naming scheme as Compiler-RT libraries
(`libclang_rt.${component}.(a|so)`). It provides some consistency
between Flang's runtime libraries for current and potential future
library components.
Avoid using the same library for runtime and compiler. `FortranDecimal`
was used in two ways:
1. As an auxiliary library needed for `libFortranRuntime.a`. This patch
adds the two source files of FortranDecimal directly into
FortranRuntime, so `FortranRuntime` is not used anymore.
2. As a library used by the Flang compiler. As the only remaining use of
the library, extra CMake code to make it compatible with the runtime can
be removed.
Before this PR, `enable_cuda_compilation` is applied to `FortranDecimal`
which causes everything that links to it, including flang (the
compiler), to depend on libcudart when CUDA support is enabled.
Having two runtime library just makes everything more complicated while
the user ideally should not be concerned with how the runtime is
structured internally. Some logic was copied for FortranDecimal because
of this, such as the ability to be compiled out-of tree
(b75a3c9f31c1ffdc9856aee32991d8129b372ee7) which is undocumented, the
logic to link against the various versions of Microsofts runtime library
(#70833), and avoiding dependency on the C++ runtime
(7783bba22c7add678d796741d30669c73159b3d8).
Reland 994cea3f0a2d0caf4d66321ad5a06ab330144d89 after bolt tests no
longer rely on -pie --unresolved-symbols=ignore-all with no input DSO
generating PLT entries.
---
Commit f10441ad003236ef3b9e5415a571d2be0c0ce5ce , while dropping a
special case for isUndefWeak and --no-dynamic-linking, made
--export-dynamic ineffective when -pie is used without any input DSO.
This change restores --export-dynamic and unifies -pie and -pie
--no-dynamic-linker when there is no input DSO.
* -pie with no input DSO suppresses undefined symbols in .dynsym.
Previously this only appied to -pie --no-dynamic-linker.
* As a side effect, -pie with no input DSO suppresses PLT.
LLVM has started to emit AArch64 build attributes sections called
.ARM.attributes. LLD does not yet have support for these so they are
accumulating in the ELF output. As the first part of that support
discard all the .ARM.attributes sections. This can be built upon by the
full implementation in LLD.
The build attributes specification only defines build attributes for
relocatable objects. The intention for LLD is that files of type ET_EXEC
and ET_SHARED will not have a build attributes in the output. A
relocatable link with -r will need a merged build attributes, but until
the merge is implemented it is better to discard.
Reland #120514 after 2f6e3df08a8b7cd29273980e47310cf09c6fdbd8 fixed
iteration order issue and libstdc++/libc++ differences.
---
Both options instruct the linker to optimize section layout with the
following goals:
* `--bp-compression-sort=[data|function|both]`: Improve Lempel-Ziv
compression by grouping similar sections together, resulting in a
smaller compressed app size.
* `--bp-startup-sort=function --irpgo-profile=<file>`: Utilize a
temporal profile file to reduce page faults during program startup.
The linker determines the section order by considering three groups:
* Function sections ordered according to the temporal profile
(`--irpgo-profile=`), prioritizing early-accessed and frequently
accessed functions.
* Function sections. Sections containing similar functions are placed
together, maximizing compression opportunities.
* Data sections. Similar data sections are placed together.
Within each group, the sections are ordered using the Balanced
Partitioning algorithm.
The linker constructs a bipartite graph with two sets of vertices:
sections and utility vertices.
* For profile-guided function sections:
+ The number of utility vertices is determined by the symbol order
within the profile file.
+ If `--bp-compression-sort-startup-functions` is specified, extra
utility vertices are allocated to prioritize nearby function similarity.
* For sections ordered for compression: Utility vertices are determined
by analyzing k-mers of the section content and relocations.
The call graph profile is disabled during this optimization.
When `--symbol-ordering-file=` is specified, sections described in that
file are placed earlier.
Co-authored-by: Pengying Xu <xpy66swsry@gmail.com>
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect referent to be nonnull.
Exposed by the test added in the reverted #120514
* Fix libstdc++/libc++ differences due to nth_element. https://github.com/llvm/llvm-project/pull/125450#issuecomment-2631404178
* Fix LLVM_ENABLE_REVERSE_ITERATION=1 differences
* Fix potential issue in `currentSize += D::getSize(*sections[*sectionIdxs.begin()])` where DenseSet was used, though not covered by a test
The ELF/bp-section-orderer.s test is failing on some buildbots due to
what seems like non-determinism issues, see comments on the original PR
and #125450
Reverting to green the build.
This reverts commit 0154dce8d39d2688b09f4e073fe601099a399365 and
follow-up commits 046dd4b28b9c1a75a96cf63465021ffa9fe1a979 and
c92f20416e6dbbde9790067b80e75ef1ef5d0fa4.
Add new ELF linker options for profile-guided section ordering
optimizations:
- `--irpgo-profile=<file>`: Read IRPGO profile data for use with startup
and compression optimizations
- `--bp-startup-sort={none,function}`: Order sections based on profile
data to improve star tup time
- `--bp-compression-sort={none,function,data,both}`: Order sections
using balanced partitioning to improve compressed size
- `--bp-compression-sort-startup-functions`: Additionally optimize
startup functions for compression
- `--verbose-bp-section-orderer`: Print statistics about balanced
partitioning section ordering
Thanks to the @ellishg, @thevinster, and their team's work.
---------
Co-authored-by: Fangrui Song <i@maskray.me>
Commit f10441ad003236ef3b9e5415a571d2be0c0ce5ce dropped a special case
for isUndefWeak and --no-dynamic-linking but also made --export-dynamic
ineffective for static PIE.
This change restores the --export-dynamic behavior and entirely drops
special handling of --no-dynamic-linker:
* -pie with no input DSO, similar to --no-dynamic-linker, suppresses
undefined symbols in .dynsym
The new behaviors resemble GNU ld more.
Commit 3733ed6f1c6b0eef1e13e175ac81ad309fc0b080 introduced isExported to
cache includeInDynsym. If we don't unnecessarily set isExported for
undefined symbols, exportDynamic/includeInDynsym can be replaced with
isExported.
Similar to the change to MarkLive.cpp when isExported was introduced.
includeInDynsym might return true even when isExported is false for
statically linked executables.
The rule here, which I'm copying from the ELF linker, is that shared
library symbols should take presence, unless the symbol has already be
extracted from the archive. e.g:
```
$ wasm-ld foo.a foo.so ref.o // .so wins
$ wasm-ld foo.a ref.o foo.so // .a wins
```
In the first case the shared library takes precedence because the lazy
symbol is replaced by the .so symbol before it is extracted from the
archive. In the second example the ref.o file causes the archive to be
exracted before the .so file is processed, so in that case the archive
file wins.
Fixes: https://github.com/emscripten-core/emscripten/issues/23501