llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-25 13:36:08 +00:00

Author	SHA1	Message	Date
Yaxun (Sam) Liu	3594769f20	[ELF] Define NOMINMAX to fix zlib.h caused build failure on Windows (#70368 ) On Windows when zlib is enabled, zlib header introduced some Windows headers which defines max as a macro. Since OutputSections.cpp uses std::max with template argument, this causes compilation error. Define macro NOMINMAX to avoid this.	2023-11-02 08:59:54 -04:00
Fangrui Song	0cbe49eade	[ELF] Implement getImplicitAddend and enable checkDynamicRelocsDefault for PPC32	2023-09-15 22:49:18 -07:00
Fangrui Song	1b65b159da	[ELF] Enable checkDynamicRelocsDefault for PPC64 .plt and .branch_lt have the type of SHT_NOBITS and may be relocated by dynamic relocations with non-zero addends. They should be skipped for the --check-dynamic-relocations check, as --apply-dynamic-relocs does not apply. A side effect is that -z rel does not work for the two sections. Added two --apply-dynamic-relocs --check-dynamic-relocations tests. Also checked linking a PPC64 clang.	2023-09-15 22:38:18 -07:00
Simi Pallipurath	f146763e07	Revert "Revert "[lld][Arm] Big Endian - Byte invariant support."" This reverts commit d8851384c6ac2a1cea15e05228dbde5f13654e23. Reason: Applied the fix for the Asan buildbot failures.	2023-06-22 16:10:18 +01:00
Simi Pallipurath	d8851384c6	Revert "[lld][Arm] Big Endian - Byte invariant support." This reverts commit 8cf8956897ce9bca3176c6339077b1ca17b27abc.	2023-06-20 17:27:44 +01:00
Simi Pallipurath	8cf8956897	[lld][Arm] Big Endian - Byte invariant support. Arm has BE8 big endian configuration called a byte-invariant(every byte has the same address on little and big-endian systems). When in BE8 mode: 1. Instructions are big-endian in relocatable objects but little-endian in executables and shared objects. 2. Data is big-endian. 3. The data encoding of the ELF file is ELFDATA2MSB. To support BE8 without an ABI break for relocatable objects,the linker takes on the responsibility of changing the endianness of instructions. At a high level the only difference between BE32 and BE8 in the linker is that for BE8: 1. The linker sets the flag EF_ARM_BE8 in the ELF header. 2. The linker endian reverses the instructions, but not data. This patch adds BE8 big endian support for Arm. To endian reverse the instructions we'll need access to the mapping symbols. Code sections can contain a mix of Arm, Thumb and literal data. We need to endian reverse Arm instructions as words, Thumb instructions as half-words and ignore literal data.The only way to find these transitions precisely is by using mapping symbols. The instruction reversal will need to take place after relocation. For Arm BE8 code sections (Section has SHF_EXECINSTR flag ) we inserted a step after relocation to endian reverse the instructions. The implementation strategy i have used here is to write all sections BE32 including SyntheticSections then endian reverse all code in InputSections via mapping symbols. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D150870	2023-06-20 14:08:21 +01:00
Fangrui Song	8d85c96e0e	[lld] StringRef::{starts,ends}with => {starts,ends}_with. NFC The latter form is now preferred to be similar to C++20 starts_with. This replacement also removes one function call when startswith is not inlined.	2023-06-05 14:36:19 -07:00
Leonard Chan	b9249a69cc	[lld][ELF] Do not emit warning for NOLOAD output sections Much of NOLOAD's intended use is to explicitly change the type of an output section, so we shouldn't flag these as warnings. Differential Revision: https://reviews.llvm.org/D151144	2023-05-23 20:41:20 +00:00
Fangrui Song	1408504564	[ELF] Name MergeSyntheticSection using an input section instead of the output section In a link map, the input section name gives more information. See the updated merge-entsize.s for an example. The output file is unchanged. Compiler generated input sections with the SHF_MERGE flag have names such as .rodata.str1.1 and .rodata.cstN, and are not affected by -fdata-sections. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D149466	2023-05-02 09:35:00 -07:00
Alexey Lapshin	fea8c07356	[Support][Parallel] Add sequential mode to TaskGroup::spawn(). This patch allows to specify that some part of tasks should be done in sequential order. It makes it possible to not use condition operator for separating sequential tasks: TaskGroup tg; for () { if(condition) ==> tg.spawn([](){fn();}, condition) fn(); else tg.spawn([](){fn();}); } It also prevents execution on main thread. Which allows adding checks for getThreadIndex() function discussed in D142318. The patch also replaces std::stack with std::deque in the ThreadPoolExecutor to have natural execution order in case (parallel::strategy.ThreadsRequested == 1). Differential Revision: https://reviews.llvm.org/D148728	2023-04-26 13:52:26 +02:00
Jez Ng	3df4c5a92f	[NFC] Optimize vector usage in lld By using emplace_back, as well as converting some loops to for-each, we can do more efficient vectorization. Make copy constructor for TemporaryFile noexcept. Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D139552	2023-01-26 20:31:42 -05:00
serge-sans-paille	984b800a03	Move from llvm::makeArrayRef to ArrayRef deduction guides - last part This is a follow-up to https://reviews.llvm.org/D140896, split into several parts as it touches a lot of files. Differential Revision: https://reviews.llvm.org/D141298	2023-01-10 11:47:43 +01:00
Guillaume Chatelet	08e2a76381	[lld][NFC] rename ELF alignment into addralign	2022-12-01 16:20:12 +00:00
Fangrui Song	1a50213ce7	[ELF] --compress-debug-sections=zstd: ignore error if zstd was not built with ZSTD_MULTITHREAD	2022-09-22 13:16:50 -07:00
Alex Brachet	38b20a02fe	[ELF] Fix std::min error on MacOs	2022-09-22 19:03:13 +00:00
Dmitri Gribenko	eda9fdc493	Fix -Wunused-local-typedef warning in some build configurations	2022-09-22 17:10:17 +02:00
Fangrui Song	fa74144c64	[ELF] Parallelize --compress-debug-sections=zstd See D117853: compressing debug sections is a bottleneck and therefore it has a large value parallizing the step. zstd provides multi-threading API and the output is deterministic even with different numbers of threads (see https://github.com/facebook/zstd/issues/2238). Therefore we can leverage it instead of using the pigz-style sharding approach. Also, switch to the default compression level 3. The current level 5 is significantly slower without providing justifying size benefit. ``` 'dash b.sh 1' ran 1.05 ± 0.01 times faster than 'dash b.sh 3' 1.18 ± 0.01 times faster than 'dash b.sh 4' 1.29 ± 0.02 times faster than 'dash b.sh 5' level=1 size: 358946945 level=3 size: 309002145 level=4 size: 307693204 level=5 size: 297828315 ``` Reviewed By: andrewng, peter.smith Differential Revision: https://reviews.llvm.org/D133679	2022-09-21 11:13:03 -07:00
Fangrui Song	449f2ca146	[ELF] Add --compress-debug-sections=zstd `clang -gz=zstd a.o` passes this option to the linker. This option compresses output debug sections with zstd and sets ch_type to ELFCOMPRESS_ZSTD. As of today, very few DWARF consumers recognize ELFCOMPRESS_ZSTD. Use the llvm::zstd::compress API with level llvm::zstd::DefaultCompression (5), which we may tune after we have more experience with zstd output. zstd has built-in parallel compression support (so we don't need to do D117853 for zlib), which is not leveraged yet. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D133548	2022-09-09 10:30:18 -07:00
Fangrui Song	3b4d800911	[ELF] Parallelize writes of different OutputSections We currently process one OutputSection at a time and for each OutputSection write contained input sections in parallel. This strategy does not leverage multi-threading well. Instead, parallelize writes of different OutputSections. The default TaskSize for parallelFor often leads to inferior sharding. We prepare the task in the caller instead. * Move llvm::parallel::detail::TaskGroup to llvm::parallel::TaskGroup * Add llvm::parallel::TaskGroup::execute. * Change writeSections to declare TaskGroup and pass it to writeTo. Speed-up with --threads=8: * clang -DCMAKE_BUILD_TYPE=Release: 1.11x as fast * clang -DCMAKE_BUILD_TYPE=Debug: 1.10x as fast * chrome -DCMAKE_BUILD_TYPE=Release: 1.04x as fast * scylladb build/release: 1.09x as fast On M1, many benchmarks are a small fraction of a percentage faster. Mozilla showed the largest difference with the patch being about 1.03x as fast. Differential Revision: https://reviews.llvm.org/D131247	2022-08-24 09:40:03 -07:00
Fangrui Song	e0612c91cd	[ELF] Optimize getInputSections. NFC In the majority of cases (e.g. orphan sections), an OutputSection has at most one InputSectionDescription (isd). By changing the return type to ArrayRef<InputSection *> we can just reference the isd->sections. For OutputSections with more than one InputSectionDescription we use a caller provided SmallVector to copy the elements as before. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D129111	2022-07-05 23:31:09 -07:00
Nico Weber	7effcbda49	Rename parallelForEachN to just parallelFor Patch created by running: rg -l parallelForEachN \| xargs sed -i '' -c 's/parallelForEachN/parallelFor/' No behavior change. Differential Revision: https://reviews.llvm.org/D128140	2022-06-19 17:49:00 -04:00
Fangrui Song	b3d5bb3b30	[ELF] Change (NOLOAD) type mismatch to use SHT_NOBITS instead of SHT_PROGBITS Placing a non-SHT_NOBITS input section in an output section specified with (NOLOAD) is fishy but used by some projects. D118840 changed the output type to SHT_PROGBITS, but using the specified type seems to make more sense and improve GNU ld compatibility: `(NOLOAD)` seems to change the output section type regardless of input. I think we should keep the current type mismatch warning as it does indicate an error-prone usage. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D125074	2022-05-06 07:49:42 -07:00
Fangrui Song	6c814931bc	[ELF] Don't use multiple inheritance for OutputSection. NFC Add an OutputDesc class inheriting from SectionCommand. An OutputDesc wraps an OutputSection. This change allows InputSection::getParent to be inlined. Differential Revision: https://reviews.llvm.org/D120650	2022-03-08 11:23:42 -08:00
Fangrui Song	4976d1fe58	[ELF] Move SyntheticSection check from InputSection::writeTo to OutputSection::writeTo. NFC Simplify code and make the heavyweight operation to the call site so that it is clearer how to improve the inefficient scheduling in the future.	2022-02-27 23:28:52 -08:00
Fangrui Song	b01430a04f	[ELF] Don't rely on Symbols.h's transitive inclusion of InputFiles.h. NFC	2022-02-23 19:18:24 -08:00
Fangrui Song	cb0a4bb5be	[ELF] Change (NOLOAD) section type mismatch error to warning Making a (NOLOAD) section SHT_PROGBITS is fishy (the user may expect all-zero content, but the linker does not check that), but some projects (e.g. Linux kernel https://github.com/ClangBuiltLinux/linux/issues/1597) traditionally rely on the behavior. Issue a warning to not break them.	2022-02-18 11:20:36 -08:00
Fangrui Song	66f8ac8d36	[ELF] Support (TYPE=<value>) to customize the output section type The current output section type allows to set the ELF section type to SHT_PROGBITS or SHT_NOLOAD. This patch allows an arbitrary section value to be specified. Some common SHT_* literal names are supported as well. ``` SECTIONS { note (TYPE=SHT_NOTE) : { BYTE(8) *(note) } init_array ( TYPE=14 ) : { QUAD(14) } fini_array (TYPE = SHT_FINI_ARRAY) : { QUAD(15) } } ``` When `sh_type` is specified, it is an error if an input section has a different type. Our syntax is compatible with GNU ld 2.39 (https://sourceware.org/bugzilla/show_bug.cgi?id=28841). Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D118840	2022-02-17 12:10:58 -08:00
Fangrui Song	27bb799095	[ELF] Clean up headers. NFC	2022-02-07 21:53:34 -08:00
Mariusz Ceier	e8bff9ae54	Fix lld standalone build lld/ELF/OutputSections.cpp includes llvm/Config/config.h for LLVM_ENABLE_ZLIB definition, but llvm/Config/config.h doesn't exist in standalone build. To fix this, this patch moves LLVM_ENABLE_ZLIB from config.h to llvm-config.h and updates OutputSections.cpp to include llvm-config.h instead of config.h Reviewed By: MaskRay, mgorny Differential Revision: https://reviews.llvm.org/D119058	2022-02-07 09:20:03 -08:00
Fangrui Song	5a2020d069	[ELF] copyShtGroup: replace unordered_set<uint32_t> with DenseSet<uint32_t>. NFC We don't need to support the empty/tombstone key section index.	2022-01-30 01:18:41 -08:00
Fangrui Song	f318fd9bf8	[ELF] crtbegin/crtend test: replace std::regex with hand-written matcher. NFC My x86-64 lld executable is 18KiB smaller.	2022-01-30 01:11:19 -08:00
Fangrui Song	fcd8817da5	[ELF] Simplify maybeCompress with lld::split. NFC	2022-01-30 00:44:19 -08:00
Fangrui Song	913914f0f8	[ELF] Simplify writing the Elf_Chdr header. NFC And avoiding changing `size` in `writeTo`.	2022-01-26 10:23:56 -08:00
Fangrui Song	2a80c3dbe1	[ELF] Clarify that Z_BEST_SPEED==1 in a comment. NFC	2022-01-25 22:40:53 -08:00
Fangrui Song	7438dbe078	[ELF] Cast size to size_t. NFC To fix ../../chromeclang/bin/../include/c++/v1/__algorithm/min.h:39:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('unsigned long' vs. 'unsigned long long') on macOS arm64.	2022-01-25 22:38:24 -08:00
Fangrui Song	223f9dea3d	[ELF] maybeCompress: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC And mention that it is zero-initialized. I do not notice a speed-up if changed to be uninitialized by forcing the zero filler in writeTo.	2022-01-25 22:15:44 -08:00
Fangrui Song	4cdc441690	[ELF] Parallelize --compress-debug-sections=zlib When linking a Debug build clang (265MiB SHF_ALLOC sections, 920MiB uncompressed debug info), in a --threads=1 link "Compress debug sections" takes 2/3 time and in a --threads=8 link "Compress debug sections" takes ~70% time. This patch splits a section into 1MiB shards and calls zlib `deflake` parallelly. DEFLATE blocks are a bit sequence. We need to ensure every shard starts at a byte boundary for concatenation. We use Z_SYNC_FLUSH for all shards but the last to flush the output to a byte boundary. (Z_FULL_FLUSH can be used as well, but Z_FULL_FLUSH clears the hash table which just wastes time.) The last block requires the BFINAL flag. We call deflate with Z_FINISH to set the flag as well as flush the output to a byte boundary. Under the hood, all of Z_SYNC_FLUSH, Z_FULL_FLUSH, and Z_FINISH emit a non-compressed block (called stored block in zlib). RFC1951 says "Any bits of input up to the next byte boundary are ignored." In a --threads=8 link, "Compress debug sections" is 5.7x as fast and the total speed is 2.54x. Because the hash table for one shard is not shared with the next shard, the output is slightly larger. Better compression ratio can be achieved by preloading the window size from the previous shard as dictionary (`deflateSetDictionary`), but that is overkill. ``` # 1MiB shards % bloaty clang.new -- clang.old FILE SIZE VM SIZE -------------- -------------- +0.3% +129Ki [ = ] 0 .debug_str +0.1% +105Ki [ = ] 0 .debug_info +0.3% +101Ki [ = ] 0 .debug_line +0.2% +2.66Ki [ = ] 0 .debug_abbrev +0.0% +1.19Ki [ = ] 0 .debug_ranges +0.1% +341Ki [ = ] 0 TOTAL # 2MiB shards % bloaty clang.new -- clang.old FILE SIZE VM SIZE -------------- -------------- +0.2% +74.2Ki [ = ] 0 .debug_line +0.1% +72.3Ki [ = ] 0 .debug_str +0.0% +69.9Ki [ = ] 0 .debug_info +0.1% +976 [ = ] 0 .debug_abbrev +0.0% +882 [ = ] 0 .debug_ranges +0.0% +218Ki [ = ] 0 TOTAL ``` Bonus in not using zlib::compress * we can compress a debug section larger than 4GiB * peak memory usage is lower because for most shards the output size is less than 50% input size (all less than 55% for a large binary I tested, but decreasing the initial output size does not decrease memory usage) Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D117853	2022-01-25 10:29:04 -08:00
Fangrui Song	a1c2ee0147	[ELF] LinkerScript/OutputSection: change other std::vector members to SmallVector 11+KiB smaller .text with both libc++ and libstdc++ builds.	2021-12-26 13:53:47 -08:00
Fangrui Song	bf7f3dd74e	[ELF] Move outSecOff addition from InputSection::writeTo to the caller Simplify the code a bit and improve consistency with SyntheticSection::writeTo.	2021-12-26 12:11:41 -08:00
Fangrui Song	ba948c5a9c	[ELF] Use SmallVector for some global variables (Files and Sections). NFC My lld executable is 26+KiB smaller.	2021-12-22 22:30:08 -08:00
Fangrui Song	6683099a0d	[ELF] Optimize RelocationSection<ELFT>::writeTo When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured ``` 9.131462 Total ExecuteLinker 1.449913 Total Write output file 1.445784 Total Write sections 0.657152 Write sections {"detail":".rela.dyn"} ``` This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time. * The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values. * The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it. With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it. Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D115993	2021-12-21 09:43:44 -08:00
Fangrui Song	8825ffdbde	[ELF] --time-trace: Trace "Write sections" writeSections is typically a bottleneck. This was used to track down the following bottlenecks: * Output section .rela.dyn (9115d75117b57115fe45153e5f38f2c444c0cd91) * Output section .debug_str (3aae04c744b03eb3eec7376f9d34fa3e42f8d108) * posix_fallocate is slow for Linux tmpfs: D115957 Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D115984	2021-12-20 10:51:24 -08:00
Fangrui Song	93558e575e	[ELF] Internalize createMergeSynthetic. NFC Only called once. Moving to OutputSections.cpp can make it inlined. finalizeInputSections can be very hot, especially in -O1 links with much debug info.	2021-12-16 20:50:06 -08:00
Fangrui Song	d060cc1f98	[ELF] Fix out-of-bounds write in memset(&Out::first, ...) Fix r285764: there is no guarantee that Out::first is placed before other static data members of `struct Out`. After `bufferStart` was introduced, this out-of-bounds write is destined in many compilers. It is likely benign, though. And move `Out::elfHeader->size` assignment beside `Out::elfHeader->sectionIndex`	2021-11-28 14:47:57 -08:00
Fangrui Song	7051aeef7a	[ELF] Rename BaseCommand to SectionCommand. NFC BaseCommand was picked when PHDRS/INSERT/etc were not implemented. Rename it to SectionCommand to match `sectionCommands` and make it clear that the commands are used in SECTIONS (except a special case for SymbolAssignment). Also, improve naming of some BaseCommand variables (base -> cmd).	2021-11-25 20:24:23 -08:00
Fangrui Song	6188fd4957	[ELF] Rename OutputSection::sectionCommands to commands. NFC This partially reverts r315409: the description applies to LinkerScript, but not to OutputSection. The name "sectionCommands" is used in both LinkerScript::sectionCommands and OutputSection::sectionCommands, which may lead to confusion. "commands" in OutputSection has no ambiguity because there are no other types of commands.	2021-11-25 16:47:07 -08:00
Fangrui Song	bf6e259b21	[ELF] Update comments/diagnostics for some long options to use the canonical two-dash form Rewrite some comments as appropriate.	2021-10-25 12:52:06 -07:00
Alex Richardson	35c5e564e6	[ELF] Check the Elf_Rel addends for dynamic relocations There used to be many cases where addends for Elf_Rel were not emitted in the final object file (mostly when building for MIPS64 since the input .o files use RELA but the output uses REL). These cases have been fixed since, but this patch adds a check to ensure that the written values are correct. It is based on a previous patch that I added to the CHERI fork of LLD since we were using MIPS64 as a baseline. The work has now almost entirely shifted to RISC-V and Arm Morello (which use Elf_Rela), but I thought it would be useful to upstream our local changes anyway. This patch adds a (hidden) command line flag --check-dynamic-relocations that can be used to enable these checks. It is also on by default in assertions builds for targets that handle all dynamic relocations kinds that LLD can emit in Target::getImplicitAddend(). Currently this is enabled for ARM, MIPS, and I386. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101450	2021-07-09 10:41:40 +01:00
Fangrui Song	16cb7910f5	[ELF] --emit-relocs: fix a crash if .rela.dyn is an empty output section Fix PR48357: If .rela.dyn appears as an output section description, its type may be SHT_RELA (due to the empty synthetic .rela.plt) while there is no input section. The empty .rela.dyn may be retained due to a reference in a linker script. Don't crash. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D93367	2020-12-16 08:59:38 -08:00
Fangrui Song	40a42f9f3f	[ELF] Make SORT_INIT_PRIORITY support .ctors.N Input sections `.ctors/.ctors.N` may go to either the output section `.init_array` or the output section `.ctors`: * output `.ctors`: currently we sort them by name. This patch changes to sort by priority from high to low. If N in `.ctors.N` is in the form of %05u, there is no semantic difference. Actually GCC and Clang do use %05u. (In the test `ctors_dtors_priority.s` and Gold's test `gold/testsuite/script_test_14.s`, we can see %03u, but they are not really produced by compilers.) * output `.init_array`: users can provide an input section description `SORT_BY_INIT_PRIORITY(.init_array.* .ctors.)` to mix `.init_array.` and `.ctors.`. This can make .init_array.N and .ctors.(65535-N) interchangeable. With this change, users can mix `.ctors.N` and `.init_array.N` in `.init_array` (PR44698 and PR48096) with linker scripts. As an example: ``` SECTIONS { .init_array : { (SORT_BY_INIT_PRIORITY(.init_array.* .ctors.)) (.init_array EXCLUDE_FILE (crtbegin.o crtbegin?.o crtend.o crtend?.o ) .ctors) } } INSERT AFTER .fini_array; SECTIONS { .fini_array : { (SORT_BY_INIT_PRIORITY(.fini_array. .dtors.)) (.fini_array EXCLUDE_FILE (crtbegin.o crtbegin?.o crtend.o crtend?.o ) .dtors) } } INSERT BEFORE .init_array; ``` Reviewed By: psmith Differential Revision: https://reviews.llvm.org/D91187	2020-11-12 08:56:12 -08:00

1 2 3 4 5 ...

782 Commits