llvm-project/lld/ELF/LinkerScript.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

477 lines
16 KiB
C
Raw Normal View History

//===- LinkerScript.h -------------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#ifndef LLD_ELF_LINKER_SCRIPT_H
#define LLD_ELF_LINKER_SCRIPT_H
#include "Config.h"
Reland: [LLD] Implement --enable-non-contiguous-regions (#90007) When enabled, input sections that would otherwise overflow a memory region are instead spilled to the next matching output section. This feature parallels the one in GNU LD, but there are some differences from its documented behavior: - /DISCARD/ only matches previously-unmatched sections (i.e., the flag does not affect it). - If a section fails to fit at any of its matches, the link fails instead of discarding the section. - The flag --enable-non-contiguous-regions-warnings is not implemented, as it exists to warn about such occurrences. The implementation places stubs at possible spill locations, and replaces them with the original input section when effecting spills. Spilling decisions occur after address assignment. Sections are spilled in reverse order of assignment, with each spill naively decreasing the size of the affected memory regions. This continues until the memory regions are brought back under size. Spilling anything causes another pass of address assignment, and this continues to fixed point. Spilling after rather than during assignment allows the algorithm to consider the size effects of unspillable input sections that appear later in the assignment. Otherwise, such sections (e.g. thunks) may force an overflow, even if spilling something earlier could have avoided it. A few notable feature interactions occur: - Stubs affect alignment, ONLY_IF_RO, etc, broadly as if a copy of the input section were actually placed there. - SHF_MERGE synthetic sections use the spill list of their first contained input section (the one that gives the section its name). - ICF occurs oblivious to spill sections; spill lists for merged-away sections become inert and are removed after assignment. - SHF_LINK_ORDER and .ARM.exidx are ordered according to the final section ordering, after all spilling has completed. - INSERT BEFORE/AFTER and OVERWRITE_SECTIONS are explicitly disallowed.
2024-05-13 12:30:50 -05:00
#include "InputSection.h"
#include "Writer.h"
#include "lld/Common/LLVM.h"
#include "lld/Common/Strings.h"
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Compiler.h"
#include <cstddef>
#include <cstdint>
#include <functional>
#include <memory>
namespace lld::elf {
class Defined;
class InputFile;
class InputSection;
class InputSectionBase;
class OutputSection;
class SectionBase;
class ThunkSection;
struct OutputDesc;
struct SectionClass;
struct SectionClassDesc;
2017-10-11 00:01:49 +00:00
// This represents an r-value in the linker script.
struct ExprValue {
ExprValue(SectionBase *sec, bool forceAbsolute, uint64_t val,
const Twine &loc)
2021-11-25 16:55:06 -08:00
: sec(sec), val(val), forceAbsolute(forceAbsolute), loc(loc.str()) {}
2017-10-11 00:01:49 +00:00
ExprValue(uint64_t val) : ExprValue(nullptr, false, val, "") {}
bool isAbsolute() const { return forceAbsolute || sec == nullptr; }
uint64_t getValue() const;
uint64_t getSecAddr() const;
uint64_t getSectionOffset() const;
2017-10-11 00:01:49 +00:00
// If a value is relative to a section, it has a non-null Sec.
SectionBase *sec;
uint64_t val;
uint64_t alignment = 1;
// The original st_type if the expression represents a symbol. Any operation
// resets type to STT_NOTYPE.
uint8_t type = llvm::ELF::STT_NOTYPE;
2021-11-25 16:55:06 -08:00
// True if this expression is enclosed in ABSOLUTE().
// This flag affects the return value of getValue().
bool forceAbsolute;
2017-10-11 00:01:49 +00:00
// Original source location. Used for error messages.
std::string loc;
};
2016-10-13 23:08:33 +00:00
// This represents an expression in the linker script.
// ScriptParser::readExpr reads an expression and returns an Expr.
// Later, we evaluate the expression by calling the function.
using Expr = std::function<ExprValue()>;
Make readExpr return an Expr object instead of a vector of tokens. Previously, we handled an expression as a vector of tokens. In other words, an expression was a vector of uncooked raw StringRefs. When we need a value of an expression, we used ExprParser to run the expression. The separation was needed essentially because parse time is too early to evaluate an expression. In order to evaluate an expression, we need to finalize section sizes. Because linker script parsing is done at very early stage of the linking process, we can't evaluate expressions while parsing. The above mechanism worked fairly well, but there were a few drawbacks. One thing is that we sometimes have to parse the same expression more than once in order to find the end of the expression. In some contexts, linker script expressions have no clear end marker. So, we needed to recognize balanced expressions and ternary operators. The other is poor error reporting. Since expressions are parsed basically twice, and some information that is available at the first stage is lost in the second stage, it was hard to print out apprpriate error messages. This patch fixes the issues with a new approach. Now the expression parsing is integrated into ScriptParser. ExprParser class is removed. Expressions are represented as lambdas instead of vectors of tokens. Lambdas captures information they need to run themselves when they are created. In this way, ends of expressions are naturally detected, and errors are handled in the usual way. This patch also reduces the amount of code. Differential Revision: https://reviews.llvm.org/D22728 llvm-svn: 276574
2016-07-24 18:19:40 +00:00
// This enum is used to implement linker script SECTIONS command.
// https://sourceware.org/binutils/docs/ld/SECTIONS.html#SECTIONS
enum SectionsCommandKind {
AssignmentKind, // . = expr or <sym> = expr
OutputSectionKind,
InputSectionKind,
ByteKind, // BYTE(expr), SHORT(expr), LONG(expr) or QUAD(expr)
ClassKind, // CLASS(class_name)
};
struct SectionCommand {
SectionCommand(int k) : kind(k) {}
int kind;
};
2016-10-13 23:08:33 +00:00
// This represents ". = <expr>" or "<symbol> = <expr>".
struct SymbolAssignment : SectionCommand {
SymbolAssignment(StringRef name, Expr e, unsigned symOrder, std::string loc)
: SectionCommand(AssignmentKind), name(name), expression(e),
symOrder(symOrder), location(loc) {}
static bool classof(const SectionCommand *c) {
return c->kind == AssignmentKind;
}
2016-07-29 05:52:33 +00:00
// The LHS of an expression. Name is either a symbol name or ".".
StringRef name;
Defined *sym = nullptr;
2016-07-29 05:52:33 +00:00
// The RHS of an expression.
Make readExpr return an Expr object instead of a vector of tokens. Previously, we handled an expression as a vector of tokens. In other words, an expression was a vector of uncooked raw StringRefs. When we need a value of an expression, we used ExprParser to run the expression. The separation was needed essentially because parse time is too early to evaluate an expression. In order to evaluate an expression, we need to finalize section sizes. Because linker script parsing is done at very early stage of the linking process, we can't evaluate expressions while parsing. The above mechanism worked fairly well, but there were a few drawbacks. One thing is that we sometimes have to parse the same expression more than once in order to find the end of the expression. In some contexts, linker script expressions have no clear end marker. So, we needed to recognize balanced expressions and ternary operators. The other is poor error reporting. Since expressions are parsed basically twice, and some information that is available at the first stage is lost in the second stage, it was hard to print out apprpriate error messages. This patch fixes the issues with a new approach. Now the expression parsing is integrated into ScriptParser. ExprParser class is removed. Expressions are represented as lambdas instead of vectors of tokens. Lambdas captures information they need to run themselves when they are created. In this way, ends of expressions are naturally detected, and errors are handled in the usual way. This patch also reduces the amount of code. Differential Revision: https://reviews.llvm.org/D22728 llvm-svn: 276574
2016-07-24 18:19:40 +00:00
Expr expression;
2016-07-29 05:52:33 +00:00
// Command attributes for PROVIDE, HIDDEN and PROVIDE_HIDDEN.
bool provide = false;
bool hidden = false;
[ELF] Align the end of PT_GNU_RELRO associated PT_LOAD to a common-page-size boundary (#66042) Close #57618: currently we align the end of PT_GNU_RELRO to a common-page-size boundary, but do not align the end of the associated PT_LOAD. This is benign when runtime_page_size >= common-page-size. However, when runtime_page_size < common-page-size, it is possible that `alignUp(end(PT_LOAD), page_size) < alignDown(end(PT_GNU_RELRO), page_size)`. In this case, rtld's mprotect call for PT_GNU_RELRO will apply to unmapped regions and lead to an error, e.g. ``` error while loading shared libraries: cannot apply additional memory protection after relocation: Cannot allocate memory ``` To fix the issue, add a padding section .relro_padding like mold, which is contained in the PT_GNU_RELRO segment and the associated PT_LOAD segment. The section also prevents strip from corrupting PT_LOAD program headers. .relro_padding has the largest `sortRank` among RELRO sections. Therefore, it is naturally placed at the end of `PT_GNU_RELRO` segment in the absence of `PHDRS`/`SECTIONS` commands. In the presence of `SECTIONS` commands, we place .relro_padding immediately before a symbol assignment using DATA_SEGMENT_RELRO_END (see also https://reviews.llvm.org/D124656), if present. DATA_SEGMENT_RELRO_END is changed to align to max-page-size instead of common-page-size. Some edge cases worth mentioning: * ppc64-toc-addis-nop.s: when PHDRS is present, do not append .relro_padding * avoid-empty-program-headers.s: when the only RELRO section is .tbss, it is not part of PT_LOAD segment, therefore we do not append .relro_padding. --- Close #65002: GNU ld from 2.39 onwards aligns the end of PT_GNU_RELRO to a max-page-size boundary (https://sourceware.org/PR28824) so that the last page is protected even if runtime_page_size > common-page-size. In my opinion, losing protection for the last page when the runtime page size is larger than common-page-size is not really an issue. Double mapping a page of up to max-common-page for the protection could cause undesired VM waste. Internally we had users complaining about 2MiB max-page-size applying to shared objects. Therefore, the end of .relro_padding is padded to a common-page-size boundary. Users who are really anxious can set common-page-size to match their runtime page size. --- 17 tests need updating as there are lots of change detectors.
2023-09-14 10:33:11 -07:00
// This assignment references DATA_SEGMENT_RELRO_END.
bool dataSegmentRelroEnd = false;
unsigned symOrder;
// Holds file name and line number for error reporting.
std::string location;
// A string representation of this command. We use this for -Map.
std::string commandString;
// Address of this assignment command.
uint64_t addr;
// Size of this assignment command. This is usually 0, but if
// you move '.' this may be greater than 0.
uint64_t size;
};
// Linker scripts allow additional constraints to be put on output sections.
2016-10-13 23:08:33 +00:00
// If an output section is marked as ONLY_IF_RO, the section is created
// only if its input sections are read-only. Likewise, an output section
// with ONLY_IF_RW is created if all input sections are RW.
enum class ConstraintKind { NoConstraint, ReadOnly, ReadWrite };
// This struct is used to represent the location and size of regions of
// target memory. Instances of the struct are created by parsing the
// MEMORY command.
struct MemoryRegion {
MemoryRegion(StringRef name, Expr origin, Expr length, uint32_t flags,
uint32_t invFlags, uint32_t negFlags, uint32_t negInvFlags)
: name(std::string(name)), origin(origin), length(length), flags(flags),
invFlags(invFlags), negFlags(negFlags), negInvFlags(negInvFlags) {}
[Coding style change] Rename variables so that they start with a lowercase letter This patch is mechanically generated by clang-llvm-rename tool that I wrote using Clang Refactoring Engine just for creating this patch. You can see the source code of the tool at https://reviews.llvm.org/D64123. There's no manual post-processing; you can generate the same patch by re-running the tool against lld's code base. Here is the main discussion thread to change the LLVM coding style: https://lists.llvm.org/pipermail/llvm-dev/2019-February/130083.html In the discussion thread, I proposed we use lld as a testbed for variable naming scheme change, and this patch does that. I chose to rename variables so that they are in camelCase, just because that is a minimal change to make variables to start with a lowercase letter. Note to downstream patch maintainers: if you are maintaining a downstream lld repo, just rebasing ahead of this commit would cause massive merge conflicts because this patch essentially changes every line in the lld subdirectory. But there's a remedy. clang-llvm-rename tool is a batch tool, so you can rename variables in your downstream repo with the tool. Given that, here is how to rebase your repo to a commit after the mass renaming: 1. rebase to the commit just before the mass variable renaming, 2. apply the tool to your downstream repo to mass-rename variables locally, and 3. rebase again to the head. Most changes made by the tool should be identical for a downstream repo and for the head, so at the step 3, almost all changes should be merged and disappear. I'd expect that there would be some lines that you need to merge by hand, but that shouldn't be too many. Differential Revision: https://reviews.llvm.org/D64121 llvm-svn: 365595
2019-07-10 05:00:37 +00:00
std::string name;
Expr origin;
Expr length;
// A section can be assigned to the region if any of these ELF section flags
// are set...
uint32_t flags;
// ... or any of these flags are not set.
// For example, the memory region attribute "r" maps to SHF_WRITE.
uint32_t invFlags;
// A section cannot be assigned to the region if any of these ELF section
// flags are set...
uint32_t negFlags;
// ... or any of these flags are not set.
// For example, the memory region attribute "!r" maps to SHF_WRITE.
uint32_t negInvFlags;
uint64_t curPos = 0;
uint64_t getOrigin() const { return origin().getValue(); }
uint64_t getLength() const { return length().getValue(); }
bool compatibleWith(uint32_t secFlags) const {
if ((secFlags & negFlags) || (~secFlags & negInvFlags))
return false;
return (secFlags & flags) || (~secFlags & invFlags);
}
};
// This struct represents one section match pattern in SECTIONS() command.
// It can optionally have negative match pattern for EXCLUDED_FILE command.
// Also it may be surrounded with SORT() command, so contains sorting rules.
class SectionPattern {
StringMatcher excludedFilePat;
// Cache of the most recent input argument and result of excludesFile().
mutable std::optional<std::pair<const InputFile *, bool>> excludesFileCache;
public:
SectionPattern(StringMatcher &&pat1, StringMatcher &&pat2)
: excludedFilePat(pat1), sectionPat(pat2),
sortOuter(SortSectionPolicy::Default),
sortInner(SortSectionPolicy::Default) {}
[Coding style change] Rename variables so that they start with a lowercase letter This patch is mechanically generated by clang-llvm-rename tool that I wrote using Clang Refactoring Engine just for creating this patch. You can see the source code of the tool at https://reviews.llvm.org/D64123. There's no manual post-processing; you can generate the same patch by re-running the tool against lld's code base. Here is the main discussion thread to change the LLVM coding style: https://lists.llvm.org/pipermail/llvm-dev/2019-February/130083.html In the discussion thread, I proposed we use lld as a testbed for variable naming scheme change, and this patch does that. I chose to rename variables so that they are in camelCase, just because that is a minimal change to make variables to start with a lowercase letter. Note to downstream patch maintainers: if you are maintaining a downstream lld repo, just rebasing ahead of this commit would cause massive merge conflicts because this patch essentially changes every line in the lld subdirectory. But there's a remedy. clang-llvm-rename tool is a batch tool, so you can rename variables in your downstream repo with the tool. Given that, here is how to rebase your repo to a commit after the mass renaming: 1. rebase to the commit just before the mass variable renaming, 2. apply the tool to your downstream repo to mass-rename variables locally, and 3. rebase again to the head. Most changes made by the tool should be identical for a downstream repo and for the head, so at the step 3, almost all changes should be merged and disappear. I'd expect that there would be some lines that you need to merge by hand, but that shouldn't be too many. Differential Revision: https://reviews.llvm.org/D64121 llvm-svn: 365595
2019-07-10 05:00:37 +00:00
2024-12-16 22:17:18 -08:00
bool excludesFile(const InputFile &file) const;
StringMatcher sectionPat;
SortSectionPolicy sortOuter;
SortSectionPolicy sortInner;
};
class InputSectionDescription : public SectionCommand {
enum class MatchType { Trivial, WholeArchive, ArchivesExcluded } matchType;
SingleStringMatcher filePat;
// Cache of the most recent input argument and result of matchesFile().
mutable std::optional<std::pair<const InputFile *, bool>> matchesFileCache;
public:
InputSectionDescription(StringRef filePattern, uint64_t withFlags = 0,
uint64_t withoutFlags = 0, StringRef classRef = {})
: SectionCommand(InputSectionKind), matchType(MatchType::Trivial),
filePat(filePattern), classRef(classRef), withFlags(withFlags),
withoutFlags(withoutFlags) {
assert((filePattern.empty() || classRef.empty()) &&
"file pattern and class reference are mutually exclusive");
// The matching syntax for whole archives and files outside of an archive
// can't be handled by SingleStringMatcher, and instead are handled
// manually within matchesFile()
if (!filePattern.empty()) {
if (filePattern.back() == ':') {
matchType = MatchType::WholeArchive;
filePat = filePattern.drop_back();
} else if (filePattern.front() == ':') {
matchType = MatchType::ArchivesExcluded;
filePat = filePattern.drop_front();
}
}
}
static bool classof(const SectionCommand *c) {
return c->kind == InputSectionKind;
}
2024-12-16 22:17:18 -08:00
bool matchesFile(const InputFile &file) const;
// Input sections that matches at least one of SectionPatterns
// will be associated with this InputSectionDescription.
SmallVector<SectionPattern, 0> sectionPatterns;
// If present, input section matching uses class membership instead of file
// and section patterns (mutually exclusive).
StringRef classRef;
// Includes InputSections and MergeInputSections. Used temporarily during
// assignment of input sections to output sections.
SmallVector<InputSectionBase *, 0> sectionBases;
// Used after the finalizeInputSections() pass. MergeInputSections have been
// merged into MergeSyntheticSections.
SmallVector<InputSection *, 0> sections;
// Temporary record of synthetic ThunkSection instances and the pass that
// they were created in. This is used to insert newly created ThunkSections
// into Sections at the end of a createThunks() pass.
SmallVector<std::pair<ThunkSection *, uint32_t>, 0> thunkSections;
// SectionPatterns can be filtered with the INPUT_SECTION_FLAGS command.
uint64_t withFlags;
uint64_t withoutFlags;
};
2016-10-13 23:08:33 +00:00
// Represents BYTE(), SHORT(), LONG(), or QUAD().
struct ByteCommand : SectionCommand {
ByteCommand(Expr e, unsigned size, std::string commandString)
: SectionCommand(ByteKind), commandString(commandString), expression(e),
size(size) {}
static bool classof(const SectionCommand *c) { return c->kind == ByteKind; }
// Keeps string representing the command. Used for -Map" is perhaps better.
std::string commandString;
Expr expression;
// This is just an offset of this assignment command in the output section.
unsigned offset;
// Size of this data command.
unsigned size;
};
struct InsertCommand {
SmallVector<StringRef, 0> names;
bool isAfter;
StringRef where;
};
// A NOCROSSREFS/NOCROSSREFS_TO command that prohibits references between
// certain output sections.
struct NoCrossRefCommand {
SmallVector<StringRef, 0> outputSections;
// When true, this describes a NOCROSSREFS_TO command that probits references
// to the first output section from any of the other sections.
bool toFirst = false;
};
struct PhdrsCommand {
StringRef name;
unsigned type = llvm::ELF::PT_NULL;
bool hasFilehdr = false;
bool hasPhdrs = false;
std::optional<unsigned> flags;
Expr lmaExpr = nullptr;
};
class LinkerScript final {
// Temporary state used in processSectionCommands() and assignAddresses()
// that must be reinitialized for each call to the above functions, and must
// not be used outside of the scope of a call to the above functions.
struct AddressState {
2024-09-21 10:22:11 -07:00
AddressState(const LinkerScript &);
OutputSection *outSec = nullptr;
MemoryRegion *memRegion = nullptr;
MemoryRegion *lmaRegion = nullptr;
2018-01-25 16:43:49 +00:00
uint64_t lmaOffset = 0;
uint64_t tbssAddr = 0;
uint64_t overlaySize;
};
2024-09-21 10:22:11 -07:00
Ctx &ctx;
SmallVector<std::unique_ptr<OutputDesc>, 0> descPool;
llvm::DenseMap<llvm::CachedHashStringRef, OutputDesc *> nameToOutputSection;
2024-08-21 20:27:44 -07:00
StringRef getOutputSectionName(const InputSectionBase *s) const;
void addSymbol(SymbolAssignment *cmd);
void declareSymbol(SymbolAssignment *cmd);
void assignSymbol(SymbolAssignment *cmd, bool inSec);
void setDot(Expr e, const Twine &loc, bool inSec);
void expandOutputSection(uint64_t size);
void expandMemoryRegions(uint64_t size);
SmallVector<InputSectionBase *, 0>
computeInputSections(const InputSectionDescription *,
ArrayRef<InputSectionBase *>, const SectionBase &outCmd);
SmallVector<InputSectionBase *, 0> createInputSectionList(OutputSection &cmd);
void discardSynthetic(OutputSection &);
SmallVector<size_t, 0> getPhdrIndices(OutputSection *sec);
std::pair<MemoryRegion *, MemoryRegion *>
findMemoryRegion(OutputSection *sec, MemoryRegion *hint);
bool assignOffsets(OutputSection *sec);
// This captures the local AddressState and makes it accessible
// deliberately. This is needed as there are some cases where we cannot just
// thread the current state through to a lambda function created by the
// script parser.
// This should remain a plain pointer as its lifetime is smaller than
// LinkerScript.
AddressState *state = nullptr;
std::unique_ptr<OutputSection> aether;
uint64_t dot = 0;
public:
// OutputSection may be incomplete. Avoid inline ctor/dtor.
LinkerScript(Ctx &ctx);
~LinkerScript();
OutputDesc *createOutputSection(StringRef name, StringRef location);
OutputDesc *getOrCreateOutputSection(StringRef name);
bool hasPhdrsCommands() { return !phdrsCommands.empty(); }
uint64_t getDot() { return dot; }
void discard(InputSectionBase &s);
ExprValue getSymbolValue(StringRef name, const Twine &loc);
void addOrphanSections();
void diagnoseOrphanHandling() const;
[LLD][ELF] Cortex-M Security Extensions (CMSE) Support This commit provides linker support for Cortex-M Security Extensions (CMSE). The specification for this feature can be found in ARM v8-M Security Extensions: Requirements on Development Tools. The linker synthesizes a security gateway veneer in a special section; `.gnu.sgstubs`, when it finds non-local symbols `__acle_se_<entry>` and `<entry>`, defined relative to the same text section and having the same address. The address of `<entry>` is retargeted to the starting address of the linker-synthesized security gateway veneer in section `.gnu.sgstubs`. In summary, the linker translates input: ``` .text entry: __acle_se_entry: [entry_code] ``` into: ``` .section .gnu.sgstubs entry: SG B.W __acle_se_entry .text __acle_se_entry: [entry_code] ``` If addresses of `__acle_se_<entry>` and `<entry>` are not equal, the linker considers that `<entry>` already defines a secure gateway veneer so does not synthesize one. If `--out-implib=<out.lib>` is specified, the linker writes the list of secure gateway veneers into a CMSE import library `<out.lib>`. The CMSE import library will have 3 sections: `.symtab`, `.strtab`, `.shstrtab`. For every secure gateway veneer <entry> at address `<addr>`, `.symtab` contains a `SHN_ABS` symbol `<entry>` with value `<addr>`. If `--in-implib=<in.lib>` is specified, the linker reads the existing CMSE import library `<in.lib>` and preserves the entry function addresses in the resulting executable and new import library. Reviewed By: MaskRay, peter.smith Differential Revision: https://reviews.llvm.org/D139092
2023-07-06 10:45:10 +01:00
void diagnoseMissingSGSectionAddress() const;
void adjustOutputSections();
void adjustSectionsAfterSorting();
SmallVector<std::unique_ptr<PhdrEntry>, 0> createPhdrs();
bool needsInterpSection();
bool shouldKeep(InputSectionBase *s);
std::pair<const OutputSection *, const Defined *> assignAddresses();
Reland: [LLD] Implement --enable-non-contiguous-regions (#90007) When enabled, input sections that would otherwise overflow a memory region are instead spilled to the next matching output section. This feature parallels the one in GNU LD, but there are some differences from its documented behavior: - /DISCARD/ only matches previously-unmatched sections (i.e., the flag does not affect it). - If a section fails to fit at any of its matches, the link fails instead of discarding the section. - The flag --enable-non-contiguous-regions-warnings is not implemented, as it exists to warn about such occurrences. The implementation places stubs at possible spill locations, and replaces them with the original input section when effecting spills. Spilling decisions occur after address assignment. Sections are spilled in reverse order of assignment, with each spill naively decreasing the size of the affected memory regions. This continues until the memory regions are brought back under size. Spilling anything causes another pass of address assignment, and this continues to fixed point. Spilling after rather than during assignment allows the algorithm to consider the size effects of unspillable input sections that appear later in the assignment. Otherwise, such sections (e.g. thunks) may force an overflow, even if spilling something earlier could have avoided it. A few notable feature interactions occur: - Stubs affect alignment, ONLY_IF_RO, etc, broadly as if a copy of the input section were actually placed there. - SHF_MERGE synthetic sections use the spill list of their first contained input section (the one that gives the section its name). - ICF occurs oblivious to spill sections; spill lists for merged-away sections become inert and are removed after assignment. - SHF_LINK_ORDER and .ARM.exidx are ordered according to the final section ordering, after all spilling has completed. - INSERT BEFORE/AFTER and OVERWRITE_SECTIONS are explicitly disallowed.
2024-05-13 12:30:50 -05:00
bool spillSections();
void erasePotentialSpillSections();
void allocateHeaders(SmallVector<std::unique_ptr<PhdrEntry>, 0> &phdrs);
void processSectionCommands();
void processSymbolAssignments();
void declareSymbols();
// Used to handle INSERT AFTER statements.
void processInsertCommands();
// Describe memory region usage.
void printMemoryUsage(raw_ostream &os);
// Record a pending error during an assignAddresses invocation.
// assignAddresses is executed more than once. Therefore, lld::error should be
// avoided to not report duplicate errors.
void recordError(const Twine &msg);
// Check backward location counter assignment and memory region/LMA overflows.
void checkFinalScriptConditions() const;
// Add symbols that are referenced in the linker script to the symbol table.
// Symbols referenced in a PROVIDE command are only added to the symbol table
// if the PROVIDE command actually provides the symbol.
// It also adds the symbols referenced by the used PROVIDE symbols to the
// linker script referenced symbols list.
void addScriptReferencedSymbolsToSymTable();
// Returns true if the PROVIDE symbol should be added to the link.
// A PROVIDE symbol is added to the link only if it satisfies an
// undefined reference.
bool shouldAddProvideSym(StringRef symName);
// SECTIONS command list.
SmallVector<SectionCommand *, 0> sectionCommands;
// PHDRS command list.
SmallVector<PhdrsCommand, 0> phdrsCommands;
bool hasSectionsCommand = false;
[ELF] Align the end of PT_GNU_RELRO associated PT_LOAD to a common-page-size boundary (#66042) Close #57618: currently we align the end of PT_GNU_RELRO to a common-page-size boundary, but do not align the end of the associated PT_LOAD. This is benign when runtime_page_size >= common-page-size. However, when runtime_page_size < common-page-size, it is possible that `alignUp(end(PT_LOAD), page_size) < alignDown(end(PT_GNU_RELRO), page_size)`. In this case, rtld's mprotect call for PT_GNU_RELRO will apply to unmapped regions and lead to an error, e.g. ``` error while loading shared libraries: cannot apply additional memory protection after relocation: Cannot allocate memory ``` To fix the issue, add a padding section .relro_padding like mold, which is contained in the PT_GNU_RELRO segment and the associated PT_LOAD segment. The section also prevents strip from corrupting PT_LOAD program headers. .relro_padding has the largest `sortRank` among RELRO sections. Therefore, it is naturally placed at the end of `PT_GNU_RELRO` segment in the absence of `PHDRS`/`SECTIONS` commands. In the presence of `SECTIONS` commands, we place .relro_padding immediately before a symbol assignment using DATA_SEGMENT_RELRO_END (see also https://reviews.llvm.org/D124656), if present. DATA_SEGMENT_RELRO_END is changed to align to max-page-size instead of common-page-size. Some edge cases worth mentioning: * ppc64-toc-addis-nop.s: when PHDRS is present, do not append .relro_padding * avoid-empty-program-headers.s: when the only RELRO section is .tbss, it is not part of PT_LOAD segment, therefore we do not append .relro_padding. --- Close #65002: GNU ld from 2.39 onwards aligns the end of PT_GNU_RELRO to a max-page-size boundary (https://sourceware.org/PR28824) so that the last page is protected even if runtime_page_size > common-page-size. In my opinion, losing protection for the last page when the runtime page size is larger than common-page-size is not really an issue. Double mapping a page of up to max-common-page for the protection could cause undesired VM waste. Internally we had users complaining about 2MiB max-page-size applying to shared objects. Therefore, the end of .relro_padding is padded to a common-page-size boundary. Users who are really anxious can set common-page-size to match their runtime page size. --- 17 tests need updating as there are lots of change detectors.
2023-09-14 10:33:11 -07:00
bool seenDataAlign = false;
bool seenRelroEnd = false;
bool errorOnMissingSection = false;
SmallVector<SmallString<0>, 0> recordedErrors;
// List of section patterns specified with KEEP commands. They will
// be kept even if they are unused and --gc-sections is specified.
SmallVector<InputSectionDescription *, 0> keptSections;
// A map from memory region name to a memory region descriptor.
llvm::MapVector<llvm::StringRef, MemoryRegion *> memoryRegions;
// A list of symbols referenced by the script.
SmallVector<llvm::StringRef, 0> referencedSymbols;
// Used to implement INSERT [AFTER|BEFORE]. Contains output sections that need
// to be reordered.
SmallVector<InsertCommand, 0> insertCommands;
[ELF] Add OVERWRITE_SECTIONS command This implements https://sourceware.org/bugzilla/show_bug.cgi?id=26404 An `OVERWRITE_SECTIONS` command is a `SECTIONS` variant which contains several output section descriptions. The output sections do not have specify an order. Similar to `INSERT [BEFORE|AFTER]`, `LinkerScript::hasSectionsCommand` is not set, so the built-in rules (see `docs/ELF/linker_script.rst`) still apply. `OVERWRITE_SECTIONS` can be more convenient than `INSERT` because it does not need an anchor section. The initial syntax is intentionally narrow to facilitate backward compatible extensions in the future. Symbol assignments cannot be used. This feature is versatile. To list a few usage: * Use `section : { KEEP(...) }` to retain input sections under GC * Define encapsulation symbols (start/end) for an output section * Use `section : ALIGN(...) : { ... }` to overalign an output section (similar to ld64 `-sectalign`) When an output section is specified by both `OVERWRITE_SECTIONS` and `INSERT`, `INSERT` is processed after overwrite sections. To make this work, this patch changes `InsertCommand` to use name based matching instead of pointer based matching. (This may cause a difference when `INSERT` moves one output section more than once. Such duplicate commands should not be used in practice (seems that in GNU ld the output sections may just disappear).) A linker script can be used without -T/--script. The traditional `SECTIONS` commands are concatenated, so a wrong rule can be more noticeable from the section order. This feature if misused can be less noticeable, just like `INSERT`. Differential Revision: https://reviews.llvm.org/D103303
2021-06-13 12:41:11 -07:00
// OutputSections specified by OVERWRITE_SECTIONS.
SmallVector<OutputDesc *, 0> overwriteSections;
[ELF] Add OVERWRITE_SECTIONS command This implements https://sourceware.org/bugzilla/show_bug.cgi?id=26404 An `OVERWRITE_SECTIONS` command is a `SECTIONS` variant which contains several output section descriptions. The output sections do not have specify an order. Similar to `INSERT [BEFORE|AFTER]`, `LinkerScript::hasSectionsCommand` is not set, so the built-in rules (see `docs/ELF/linker_script.rst`) still apply. `OVERWRITE_SECTIONS` can be more convenient than `INSERT` because it does not need an anchor section. The initial syntax is intentionally narrow to facilitate backward compatible extensions in the future. Symbol assignments cannot be used. This feature is versatile. To list a few usage: * Use `section : { KEEP(...) }` to retain input sections under GC * Define encapsulation symbols (start/end) for an output section * Use `section : ALIGN(...) : { ... }` to overalign an output section (similar to ld64 `-sectalign`) When an output section is specified by both `OVERWRITE_SECTIONS` and `INSERT`, `INSERT` is processed after overwrite sections. To make this work, this patch changes `InsertCommand` to use name based matching instead of pointer based matching. (This may cause a difference when `INSERT` moves one output section more than once. Such duplicate commands should not be used in practice (seems that in GNU ld the output sections may just disappear).) A linker script can be used without -T/--script. The traditional `SECTIONS` commands are concatenated, so a wrong rule can be more noticeable from the section order. This feature if misused can be less noticeable, just like `INSERT`. Differential Revision: https://reviews.llvm.org/D103303
2021-06-13 12:41:11 -07:00
// NOCROSSREFS(_TO) commands.
SmallVector<NoCrossRefCommand, 0> noCrossRefs;
// Sections that will be warned/errored by --orphan-handling.
SmallVector<const InputSectionBase *, 0> orphanSections;
// Stores the mapping: PROVIDE symbol -> symbols referred in the PROVIDE
// expression. For example, if the PROVIDE command is:
//
// PROVIDE(v = a + b + c);
//
// then provideMap should contain the mapping: 'v' -> ['a', 'b', 'c']
llvm::MapVector<StringRef, SmallVector<StringRef, 0>> provideMap;
// Store defined symbols that should ignore PROVIDE commands.
llvm::DenseSet<Symbol *> unusedProvideSyms;
Reland: [LLD] Implement --enable-non-contiguous-regions (#90007) When enabled, input sections that would otherwise overflow a memory region are instead spilled to the next matching output section. This feature parallels the one in GNU LD, but there are some differences from its documented behavior: - /DISCARD/ only matches previously-unmatched sections (i.e., the flag does not affect it). - If a section fails to fit at any of its matches, the link fails instead of discarding the section. - The flag --enable-non-contiguous-regions-warnings is not implemented, as it exists to warn about such occurrences. The implementation places stubs at possible spill locations, and replaces them with the original input section when effecting spills. Spilling decisions occur after address assignment. Sections are spilled in reverse order of assignment, with each spill naively decreasing the size of the affected memory regions. This continues until the memory regions are brought back under size. Spilling anything causes another pass of address assignment, and this continues to fixed point. Spilling after rather than during assignment allows the algorithm to consider the size effects of unspillable input sections that appear later in the assignment. Otherwise, such sections (e.g. thunks) may force an overflow, even if spilling something earlier could have avoided it. A few notable feature interactions occur: - Stubs affect alignment, ONLY_IF_RO, etc, broadly as if a copy of the input section were actually placed there. - SHF_MERGE synthetic sections use the spill list of their first contained input section (the one that gives the section its name). - ICF occurs oblivious to spill sections; spill lists for merged-away sections become inert and are removed after assignment. - SHF_LINK_ORDER and .ARM.exidx are ordered according to the final section ordering, after all spilling has completed. - INSERT BEFORE/AFTER and OVERWRITE_SECTIONS are explicitly disallowed.
2024-05-13 12:30:50 -05:00
// List of potential spill locations (PotentialSpillSection) for an input
// section.
struct PotentialSpillList {
// Never nullptr.
PotentialSpillSection *head;
PotentialSpillSection *tail;
};
llvm::DenseMap<InputSectionBase *, PotentialSpillList> potentialSpillLists;
// Named lists of input sections that can be collectively referenced in output
// section descriptions. Multiple references allow for sections to spill from
// one output section to another.
llvm::DenseMap<llvm::CachedHashStringRef, SectionClassDesc *> sectionClasses;
};
} // end namespace lld::elf
#endif // LLD_ELF_LINKER_SCRIPT_H