llvm-project/lld/ELF/Thunks.cpp

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

1722 lines
62 KiB
C++
Raw Normal View History

//===- Thunks.cpp --------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===---------------------------------------------------------------------===//
//
2016-07-09 23:16:00 +00:00
// This file contains Thunk subclasses.
//
// A thunk is a small piece of code written after an input section
// which is used to jump between "incompatible" functions
// such as MIPS PIC and non-PIC or ARM non-Thumb and Thumb functions.
//
// If a jump target is too far and its address doesn't fit to a
// short jump instruction, we need to create a thunk too, but we
// haven't supported it yet.
//
// i386 and x86-64 don't need thunks.
//
//===---------------------------------------------------------------------===//
#include "Thunks.h"
#include "Config.h"
#include "InputFiles.h"
#include "InputSection.h"
#include "OutputSections.h"
#include "Symbols.h"
#include "SyntheticSections.h"
#include "Target.h"
#include "lld/Common/CommonLinkerContext.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"
#include <cstdint>
#include <cstring>
using namespace llvm;
using namespace llvm::object;
using namespace llvm::ELF;
using namespace lld;
using namespace lld::elf;
namespace {
// Base class for AArch64 thunks.
//
// An AArch64 thunk may be either short or long. A short thunk is simply a
// branch (B) instruction, and it may be used to call AArch64 functions when the
// distance from the thunk to the target is less than 128MB. Long thunks can
// branch to any virtual address and they are implemented in the derived
// classes. This class tries to create a short thunk if the target is in range,
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
// otherwise it creates a long thunk. When BTI is enabled indirect branches
// must land on a BTI instruction. If the destination does not have a BTI
// instruction mayNeedLandingPad is set to true and Thunk::landingPad points
// to an alternative entry point with a BTI.
class AArch64Thunk : public Thunk {
public:
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
AArch64Thunk(Ctx &ctx, Symbol &dest, int64_t addend, bool mayNeedLandingPad)
: Thunk(ctx, dest, addend), mayNeedLandingPad(mayNeedLandingPad) {}
bool getMayUseShortThunk();
void writeTo(uint8_t *buf) override;
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
bool needsSyntheticLandingPad() override;
protected:
bool mayNeedLandingPad;
private:
bool mayUseShortThunk = true;
virtual void writeLong(uint8_t *buf) = 0;
// A thunk may be written out as a short or long, and we may not know which
// type at thunk creation time. In some thunk implementations the long thunk
// has additional mapping symbols. Thus function can be overridden to add
// these additional mapping symbols.
virtual void addLongMapSyms() {}
};
// AArch64 long range Thunks.
class AArch64ABSLongThunk final : public AArch64Thunk {
public:
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
AArch64ABSLongThunk(Ctx &ctx, Symbol &dest, int64_t addend,
bool mayNeedLandingPad)
: AArch64Thunk(ctx, dest, addend, mayNeedLandingPad) {}
uint32_t size() override { return getMayUseShortThunk() ? 4 : 16; }
void addSymbols(ThunkSection &isec) override;
private:
void writeLong(uint8_t *buf) override;
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
class AArch64ADRPThunk final : public AArch64Thunk {
public:
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
AArch64ADRPThunk(Ctx &ctx, Symbol &dest, int64_t addend,
bool mayNeedLandingPad)
: AArch64Thunk(ctx, dest, addend, mayNeedLandingPad) {}
uint32_t size() override { return getMayUseShortThunk() ? 4 : 12; }
void addSymbols(ThunkSection &isec) override;
private:
void writeLong(uint8_t *buf) override;
};
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
// AArch64 BTI Landing Pad
// When BTI is enabled indirect branches must land on a BTI
// compatible instruction. When the destination does not have a
// BTI compatible instruction a Thunk doing an indirect branch
// targets a Landing Pad Thunk that direct branches to the target.
class AArch64BTILandingPadThunk final : public Thunk {
public:
AArch64BTILandingPadThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: Thunk(ctx, dest, addend) {}
uint32_t size() override { return getMayUseShortThunk() ? 4 : 8; }
void addSymbols(ThunkSection &isec) override;
void writeTo(uint8_t *buf) override;
private:
bool getMayUseShortThunk();
void writeLong(uint8_t *buf);
bool mayUseShortThunk = true;
};
// Base class for ARM thunks.
//
// An ARM thunk may be either short or long. A short thunk is simply a branch
// (B) instruction, and it may be used to call ARM functions when the distance
// from the thunk to the target is less than 32MB. Long thunks can branch to any
// virtual address and can switch between ARM and Thumb, and they are
// implemented in the derived classes. This class tries to create a short thunk
// if the target is in range, otherwise it creates a long thunk.
class ARMThunk : public Thunk {
public:
2024-09-29 15:20:01 -07:00
ARMThunk(Ctx &ctx, Symbol &dest, int64_t addend) : Thunk(ctx, dest, addend) {}
bool getMayUseShortThunk();
uint32_t size() override { return getMayUseShortThunk() ? 4 : sizeLong(); }
void writeTo(uint8_t *buf) override;
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
bool isCompatibleWith(const InputSection &isec,
const Relocation &rel) const override;
// Returns the size of a long thunk.
virtual uint32_t sizeLong() = 0;
// Writes a long thunk to Buf.
virtual void writeLong(uint8_t *buf) = 0;
private:
// This field tracks whether all previously considered layouts would allow
// this thunk to be short. If we have ever needed a long thunk, we always
// create a long thunk, even if the thunk may be short given the current
// distance to the target. We do this because transitioning from long to short
// can create layout oscillations in certain corner cases which would prevent
// the layout from converging.
bool mayUseShortThunk = true;
// See comment in AArch64Thunk.
virtual void addLongMapSyms() {}
};
// Base class for Thumb-2 thunks.
//
// This class is similar to ARMThunk, but it uses the Thumb-2 B.W instruction
// which has a range of 16MB.
class ThumbThunk : public Thunk {
public:
2024-09-29 15:20:01 -07:00
ThumbThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: Thunk(ctx, dest, addend) {
[LLD][ELF][ARM] Refactor inBranchRange to use addend for PC Bias In AArch32 ARM, the PC reads two instructions ahead of the currently executiing instruction. This evaluates to 8 in ARM state and 4 in Thumb state. Branch instructions on AArch32 compensate for this by subtracting the PC bias from the addend. For a branch to symbol this will result in an addend of -8 in ARM state and -4 in Thumb state. The existing ARM Target::inBranchRange function accounted for this implict addend within the function meaning that if the addend were to be taken into account by the caller then it would be double counted. This complicates the interface for all Targets as callers wanting to account for addends had to account for the ARM PC-bias. In certain situations such as: https://github.com/ClangBuiltLinux/linux/issues/1305 the PC-bias compensation code didn't match up. In particular normalizeExistingThunk() didn't put the PC-bias back in as Arm thunks did not store the addend. The simplest fix for the problem is to add the PC bias in normalizeExistingThunk when restoring the addend. However I think it is worth refactoring the Arm inBranchRange implementation so that fewer calls to getPCBias are needed for other Targets. I wasn't able to remove getPCBias completely but hopefully the Relocations.cpp code is simpler now. In principle a test could be written to replicate the linux kernel build failure but I wasn't able to reproduce with a small example that I could build up from scratch. Fixes https://github.com/ClangBuiltLinux/linux/issues/1305 Differential Revision: https://reviews.llvm.org/D97550
2021-02-26 13:14:21 +00:00
alignment = 2;
}
bool getMayUseShortThunk();
uint32_t size() override { return getMayUseShortThunk() ? 4 : sizeLong(); }
void writeTo(uint8_t *buf) override;
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
bool isCompatibleWith(const InputSection &isec,
const Relocation &rel) const override;
// Returns the size of a long thunk.
virtual uint32_t sizeLong() = 0;
// Writes a long thunk to Buf.
virtual void writeLong(uint8_t *buf) = 0;
private:
// See comment in ARMThunk above.
bool mayUseShortThunk = true;
// See comment in AArch64Thunk.
virtual void addLongMapSyms() {}
};
// Specific ARM Thunk implementations. The naming convention is:
// Source State, TargetState, Target Requirement, ABS or PI, Range
class ARMV7ABSLongThunk final : public ARMThunk {
public:
2024-09-29 15:20:01 -07:00
ARMV7ABSLongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ARMThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 12; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
};
class ARMV7PILongThunk final : public ARMThunk {
public:
2024-09-29 15:20:01 -07:00
ARMV7PILongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ARMThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 16; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
};
class ThumbV7ABSLongThunk final : public ThumbThunk {
public:
2024-09-29 15:20:01 -07:00
ThumbV7ABSLongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ThumbThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 10; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
};
class ThumbV7PILongThunk final : public ThumbThunk {
public:
2024-09-29 15:20:01 -07:00
ThumbV7PILongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ThumbThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 12; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
};
// Implementations of Thunks for Arm v6-M. Only Thumb instructions are permitted
class ThumbV6MABSLongThunk final : public ThumbThunk {
public:
2024-09-29 15:20:01 -07:00
ThumbV6MABSLongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ThumbThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 12; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
private:
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
class ThumbV6MABSXOLongThunk final : public ThumbThunk {
public:
2024-09-29 15:20:01 -07:00
ThumbV6MABSXOLongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ThumbThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 20; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
};
class ThumbV6MPILongThunk final : public ThumbThunk {
public:
2024-09-29 15:20:01 -07:00
ThumbV6MPILongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ThumbThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 16; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
private:
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
// Architectures v4, v5 and v6 do not support the movt/movw instructions. v5 and
// v6 support BLX to which BL instructions can be rewritten inline. There are no
// Thumb entrypoints for v5 and v6 as there is no Thumb branch instruction on
// these architecture that can result in a thunk.
// LDR on v5 and v6 can switch processor state, so for v5 and v6,
// ARMV5LongLdrPcThunk can be used for both Arm->Arm and Arm->Thumb calls. v4
// can also use this thunk, but only for Arm->Arm calls.
class ARMV5LongLdrPcThunk final : public ARMThunk {
public:
2024-09-29 15:20:01 -07:00
ARMV5LongLdrPcThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ARMThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 8; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
private:
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
// Implementations of Thunks for v4. BLX is not supported, and loads
// will not invoke Arm/Thumb state changes.
class ARMV4PILongBXThunk final : public ARMThunk {
public:
2024-09-29 15:20:01 -07:00
ARMV4PILongBXThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ARMThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 16; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
private:
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
class ARMV4PILongThunk final : public ARMThunk {
public:
2024-09-29 15:20:01 -07:00
ARMV4PILongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ARMThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 12; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
private:
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
class ThumbV4PILongBXThunk final : public ThumbThunk {
public:
2024-09-29 15:20:01 -07:00
ThumbV4PILongBXThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ThumbThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 16; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
private:
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
class ThumbV4PILongThunk final : public ThumbThunk {
public:
2024-09-29 15:20:01 -07:00
ThumbV4PILongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ThumbThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 20; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
private:
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
class ARMV4ABSLongBXThunk final : public ARMThunk {
public:
2024-09-29 15:20:01 -07:00
ARMV4ABSLongBXThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ARMThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 12; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
private:
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
class ThumbV4ABSLongBXThunk final : public ThumbThunk {
public:
2024-09-29 15:20:01 -07:00
ThumbV4ABSLongBXThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ThumbThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 12; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
private:
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
class ThumbV4ABSLongThunk final : public ThumbThunk {
public:
2024-09-29 15:20:01 -07:00
ThumbV4ABSLongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: ThumbThunk(ctx, dest, addend) {}
uint32_t sizeLong() override { return 16; }
void writeLong(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
private:
void addLongMapSyms() override;
ThunkSection *tsec = nullptr;
};
// The AVR devices need thunks for R_AVR_LO8_LDI_GS/R_AVR_HI8_LDI_GS
// when their destination is out of range [0, 0x1ffff].
class AVRThunk : public Thunk {
public:
2024-09-29 15:20:01 -07:00
AVRThunk(Ctx &ctx, Symbol &dest, int64_t addend) : Thunk(ctx, dest, addend) {}
uint32_t size() override { return 4; }
void writeTo(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
};
// MIPS LA25 thunk
class MipsThunk final : public Thunk {
public:
2024-09-29 15:20:01 -07:00
MipsThunk(Ctx &ctx, Symbol &dest) : Thunk(ctx, dest, 0) {}
uint32_t size() override { return 16; }
void writeTo(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
InputSection *getTargetInputSection() const override;
};
// microMIPS R2-R5 LA25 thunk
class MicroMipsThunk final : public Thunk {
public:
2024-09-29 15:20:01 -07:00
MicroMipsThunk(Ctx &ctx, Symbol &dest) : Thunk(ctx, dest, 0) {}
uint32_t size() override { return 14; }
void writeTo(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
InputSection *getTargetInputSection() const override;
};
// microMIPS R6 LA25 thunk
class MicroMipsR6Thunk final : public Thunk {
public:
2024-09-29 15:20:01 -07:00
MicroMipsR6Thunk(Ctx &ctx, Symbol &dest) : Thunk(ctx, dest, 0) {}
uint32_t size() override { return 12; }
void writeTo(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
InputSection *getTargetInputSection() const override;
};
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
class PPC32PltCallStub final : public Thunk {
public:
// For R_PPC_PLTREL24, Thunk::addend records the addend which will be used to
// decide the offsets in the call stub.
2024-09-29 15:20:01 -07:00
PPC32PltCallStub(Ctx &ctx, const InputSection &isec, const Relocation &rel,
Symbol &dest)
2024-09-29 15:20:01 -07:00
: Thunk(ctx, dest, rel.addend), file(isec.file) {}
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
uint32_t size() override { return 16; }
void writeTo(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
bool isCompatibleWith(const InputSection &isec, const Relocation &rel) const override;
private:
// Records the call site of the call stub.
const InputFile *file;
};
class PPC32LongThunk final : public Thunk {
public:
2024-09-29 15:20:01 -07:00
PPC32LongThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: Thunk(ctx, dest, addend) {}
uint32_t size() override { return ctx.arg.isPic ? 32 : 16; }
void writeTo(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
};
// PPC64 Plt call stubs.
// Any call site that needs to call through a plt entry needs a call stub in
// the .text section. The call stub is responsible for:
// 1) Saving the toc-pointer to the stack.
// 2) Loading the target functions address from the procedure linkage table into
// r12 for use by the target functions global entry point, and into the count
// register.
// 3) Transferring control to the target function through an indirect branch.
class PPC64PltCallStub final : public Thunk {
public:
2024-09-29 15:20:01 -07:00
PPC64PltCallStub(Ctx &ctx, Symbol &dest) : Thunk(ctx, dest, 0) {}
uint32_t size() override { return 20; }
void writeTo(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
bool isCompatibleWith(const InputSection &isec,
const Relocation &rel) const override;
};
// PPC64 R2 Save Stub
// When the caller requires a valid R2 TOC pointer but the callee does not
// require a TOC pointer and the callee cannot guarantee that it doesn't
// clobber R2 then we need to save R2. This stub:
// 1) Saves the TOC pointer to the stack.
// 2) Tail calls the callee.
class PPC64R2SaveStub final : public Thunk {
public:
2024-09-29 15:20:01 -07:00
PPC64R2SaveStub(Ctx &ctx, Symbol &dest, int64_t addend)
: Thunk(ctx, dest, addend) {
alignment = 16;
}
// To prevent oscillations in layout when moving from short to long thunks
// we make sure that once a thunk has been set to long it cannot go back.
bool getMayUseShortThunk() {
if (!mayUseShortThunk)
return false;
if (!isInt<26>(computeOffset())) {
mayUseShortThunk = false;
return false;
}
return true;
}
uint32_t size() override { return getMayUseShortThunk() ? 8 : 32; }
void writeTo(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
bool isCompatibleWith(const InputSection &isec,
const Relocation &rel) const override;
private:
// Transitioning from long to short can create layout oscillations in
// certain corner cases which would prevent the layout from converging.
// This is similar to the handling for ARMThunk.
bool mayUseShortThunk = true;
int64_t computeOffset() const {
2024-10-19 20:32:58 -07:00
return destination.getVA(ctx) - (getThunkTargetSym()->getVA(ctx) + 4);
}
};
// PPC64 R12 Setup Stub
// When a caller that does not maintain TOC calls a target which may possibly
// use TOC (either non-preemptible with localentry>1 or preemptible), we need to
// set r12 to satisfy the requirement of the global entry point.
class PPC64R12SetupStub final : public Thunk {
public:
2024-09-29 15:20:01 -07:00
PPC64R12SetupStub(Ctx &ctx, Symbol &dest, bool gotPlt)
: Thunk(ctx, dest, 0), gotPlt(gotPlt) {
alignment = 16;
}
uint32_t size() override { return 32; }
void writeTo(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
bool isCompatibleWith(const InputSection &isec,
const Relocation &rel) const override;
private:
bool gotPlt;
};
// A bl instruction uses a signed 24 bit offset, with an implicit 4 byte
// alignment. This gives a possible 26 bits of 'reach'. If the call offset is
// larger than that we need to emit a long-branch thunk. The target address
// of the callee is stored in a table to be accessed TOC-relative. Since the
// call must be local (a non-local call will have a PltCallStub instead) the
// table stores the address of the callee's local entry point. For
// position-independent code a corresponding relative dynamic relocation is
// used.
class PPC64LongBranchThunk : public Thunk {
public:
uint32_t size() override { return 32; }
void writeTo(uint8_t *buf) override;
void addSymbols(ThunkSection &isec) override;
bool isCompatibleWith(const InputSection &isec,
const Relocation &rel) const override;
protected:
2024-09-29 15:20:01 -07:00
PPC64LongBranchThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: Thunk(ctx, dest, addend) {}
};
class PPC64PILongBranchThunk final : public PPC64LongBranchThunk {
public:
2024-09-29 15:20:01 -07:00
PPC64PILongBranchThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: PPC64LongBranchThunk(ctx, dest, addend) {
assert(!dest.isPreemptible);
if (std::optional<uint32_t> index =
ctx.in.ppc64LongBranchTarget->addEntry(&dest, addend)) {
ctx.mainPart->relaDyn->addRelativeReloc(
ctx.target->relativeRel, *ctx.in.ppc64LongBranchTarget,
*index * UINT64_C(8), dest,
2024-11-14 22:30:29 -08:00
addend + getPPC64GlobalEntryToLocalEntryOffset(ctx, dest.stOther),
ctx.target->symbolicRel, R_ABS);
}
}
};
class PPC64PDLongBranchThunk final : public PPC64LongBranchThunk {
public:
2024-09-29 15:20:01 -07:00
PPC64PDLongBranchThunk(Ctx &ctx, Symbol &dest, int64_t addend)
: PPC64LongBranchThunk(ctx, dest, addend) {
ctx.in.ppc64LongBranchTarget->addEntry(&dest, addend);
}
};
} // end anonymous namespace
Defined *Thunk::addSymbol(StringRef name, uint8_t type, uint64_t value,
InputSectionBase &section) {
2024-10-11 20:15:02 -07:00
Defined *d = addSyntheticLocal(ctx, name, type, value, /*size=*/0, section);
syms.push_back(d);
return d;
}
void Thunk::setOffset(uint64_t newOffset) {
for (Defined *d : syms)
d->value = d->value - offset + newOffset;
offset = newOffset;
}
// AArch64 Thunk base class.
2024-10-07 23:29:11 -07:00
static uint64_t getAArch64ThunkDestVA(Ctx &ctx, const Symbol &s, int64_t a) {
2024-10-19 20:32:58 -07:00
uint64_t v = s.isInPlt(ctx) ? s.getPltVA(ctx) : s.getVA(ctx, a);
return v;
}
bool AArch64Thunk::getMayUseShortThunk() {
if (!mayUseShortThunk)
return false;
2024-10-07 23:29:11 -07:00
uint64_t s = getAArch64ThunkDestVA(ctx, destination, addend);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx);
mayUseShortThunk = llvm::isInt<28>(s - p);
if (!mayUseShortThunk)
addLongMapSyms();
return mayUseShortThunk;
}
void AArch64Thunk::writeTo(uint8_t *buf) {
if (!getMayUseShortThunk()) {
writeLong(buf);
return;
}
2024-10-07 23:29:11 -07:00
uint64_t s = getAArch64ThunkDestVA(ctx, destination, addend);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx);
2024-10-13 10:37:47 -07:00
write32(ctx, buf, 0x14000000); // b S
ctx.target->relocateNoSym(buf, R_AARCH64_CALL26, s - p);
}
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
bool AArch64Thunk::needsSyntheticLandingPad() {
// Short Thunks use a direct branch, no synthetic landing pad
// required.
return mayNeedLandingPad && !getMayUseShortThunk();
}
// AArch64 long range Thunks.
void AArch64ABSLongThunk::writeLong(uint8_t *buf) {
const uint8_t data[] = {
0x50, 0x00, 0x00, 0x58, // ldr x16, L0
0x00, 0x02, 0x1f, 0xd6, // br x16
0x00, 0x00, 0x00, 0x00, // L0: .xword S
0x00, 0x00, 0x00, 0x00,
};
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
// If mayNeedLandingPad is true then destination is an
// AArch64BTILandingPadThunk that defines landingPad.
assert(!mayNeedLandingPad || landingPad != nullptr);
2024-10-07 23:29:11 -07:00
uint64_t s = mayNeedLandingPad
2024-10-19 20:32:58 -07:00
? landingPad->getVA(ctx, 0)
2024-10-07 23:29:11 -07:00
: getAArch64ThunkDestVA(ctx, destination, addend);
memcpy(buf, data, sizeof(data));
ctx.target->relocateNoSym(buf + 8, R_AARCH64_ABS64, s);
}
void AArch64ABSLongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__AArch64AbsLongThunk_" + destination.getName()),
STT_FUNC, 0, isec);
addSymbol("$x", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void AArch64ABSLongThunk::addLongMapSyms() {
addSymbol("$d", STT_NOTYPE, 8, *tsec);
}
// This Thunk has a maximum range of 4Gb, this is sufficient for all programs
// using the small code model, including pc-relative ones. At time of writing
// clang and gcc do not support the large code model for position independent
// code so it is safe to use this for position independent thunks without
// worrying about the destination being more than 4Gb away.
void AArch64ADRPThunk::writeLong(uint8_t *buf) {
const uint8_t data[] = {
0x10, 0x00, 0x00, 0x90, // adrp x16, Dest R_AARCH64_ADR_PREL_PG_HI21(Dest)
0x10, 0x02, 0x00, 0x91, // add x16, x16, R_AARCH64_ADD_ABS_LO12_NC(Dest)
0x00, 0x02, 0x1f, 0xd6, // br x16
};
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
// if mayNeedLandingPad is true then destination is an
// AArch64BTILandingPadThunk that defines landingPad.
assert(!mayNeedLandingPad || landingPad != nullptr);
2024-10-07 23:29:11 -07:00
uint64_t s = mayNeedLandingPad
2024-10-19 20:32:58 -07:00
? landingPad->getVA(ctx, 0)
2024-10-07 23:29:11 -07:00
: getAArch64ThunkDestVA(ctx, destination, addend);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx);
memcpy(buf, data, sizeof(data));
ctx.target->relocateNoSym(buf, R_AARCH64_ADR_PREL_PG_HI21,
getAArch64Page(s) - getAArch64Page(p));
ctx.target->relocateNoSym(buf + 4, R_AARCH64_ADD_ABS_LO12_NC, s);
}
void AArch64ADRPThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__AArch64ADRPThunk_" + destination.getName()),
STT_FUNC, 0, isec);
addSymbol("$x", STT_NOTYPE, 0, isec);
}
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
void AArch64BTILandingPadThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__AArch64BTIThunk_" + destination.getName()),
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
STT_FUNC, 0, isec);
addSymbol("$x", STT_NOTYPE, 0, isec);
}
void AArch64BTILandingPadThunk::writeTo(uint8_t *buf) {
if (!getMayUseShortThunk()) {
writeLong(buf);
return;
}
2024-10-13 10:37:47 -07:00
write32(ctx, buf, 0xd503245f); // BTI c
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
// Control falls through to target in following section.
}
bool AArch64BTILandingPadThunk::getMayUseShortThunk() {
if (!mayUseShortThunk)
return false;
// If the target is the following instruction then we can fall
// through without the indirect branch.
2024-10-19 20:32:58 -07:00
uint64_t s = destination.getVA(ctx, addend);
uint64_t p = getThunkTargetSym()->getVA(ctx);
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
// This function is called before addresses are stable. We need to
// work out the range from the thunk to the next section but the
// address of the start of the next section depends on the size of
// the thunks in the previous pass. s - p + offset == 0 represents
// the first pass where the Thunk and following section are assigned
// the same offset. s - p <= 4 is the last Thunk in the Thunk
// Section.
mayUseShortThunk = (s - p + offset == 0 || s - p <= 4);
return mayUseShortThunk;
}
void AArch64BTILandingPadThunk::writeLong(uint8_t *buf) {
2024-10-19 20:32:58 -07:00
uint64_t s = destination.getVA(ctx, addend);
uint64_t p = getThunkTargetSym()->getVA(ctx) + 4;
2024-10-13 10:37:47 -07:00
write32(ctx, buf, 0xd503245f); // BTI c
write32(ctx, buf + 4, 0x14000000); // B S
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
ctx.target->relocateNoSym(buf + 4, R_AARCH64_CALL26, s - p);
}
// ARM Target Thunks
2024-10-07 23:29:11 -07:00
static uint64_t getARMThunkDestVA(Ctx &ctx, const Symbol &s) {
2024-10-19 20:32:58 -07:00
uint64_t v = s.isInPlt(ctx) ? s.getPltVA(ctx) : s.getVA(ctx);
return SignExtend64<32>(v);
}
// This function returns true if the target is not Thumb and is within 2^26, and
// it has not previously returned false (see comment for mayUseShortThunk).
bool ARMThunk::getMayUseShortThunk() {
if (!mayUseShortThunk)
return false;
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
if (s & 1) {
mayUseShortThunk = false;
addLongMapSyms();
return false;
}
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx);
int64_t offset = s - p - 8;
mayUseShortThunk = llvm::isInt<26>(offset);
if (!mayUseShortThunk)
addLongMapSyms();
return mayUseShortThunk;
}
void ARMThunk::writeTo(uint8_t *buf) {
if (!getMayUseShortThunk()) {
writeLong(buf);
return;
}
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx);
int64_t offset = s - p - 8;
2024-10-13 10:37:47 -07:00
write32(ctx, buf, 0xea000000); // b S
ctx.target->relocateNoSym(buf, R_ARM_JUMP24, offset);
}
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
bool ARMThunk::isCompatibleWith(const InputSection &isec,
const Relocation &rel) const {
// v4T does not have BLX, so also deny R_ARM_THM_CALL
if (!ctx.arg.armHasBlx && rel.type == R_ARM_THM_CALL)
return false;
// Thumb branch relocations can't use BLX
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
return rel.type != R_ARM_THM_JUMP19 && rel.type != R_ARM_THM_JUMP24;
}
// This function returns true if:
// the target is Thumb
// && is within branch range
// && this function has not previously returned false
// (see comment for mayUseShortThunk)
// && the arch supports Thumb branch range extension.
bool ThumbThunk::getMayUseShortThunk() {
if (!mayUseShortThunk)
return false;
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
if ((s & 1) == 0 || !ctx.arg.armJ1J2BranchEncoding) {
mayUseShortThunk = false;
addLongMapSyms();
return false;
}
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx) & ~1;
int64_t offset = s - p - 4;
mayUseShortThunk = llvm::isInt<25>(offset);
if (!mayUseShortThunk)
addLongMapSyms();
return mayUseShortThunk;
}
void ThumbThunk::writeTo(uint8_t *buf) {
if (!getMayUseShortThunk()) {
writeLong(buf);
return;
}
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx);
int64_t offset = s - p - 4;
write16(ctx, buf + 0, 0xf000); // b.w S
write16(ctx, buf + 2, 0xb000);
ctx.target->relocateNoSym(buf, R_ARM_THM_JUMP24, offset);
}
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
bool ThumbThunk::isCompatibleWith(const InputSection &isec,
const Relocation &rel) const {
// v4T does not have BLX, so also deny R_ARM_CALL
if (!ctx.arg.armHasBlx && rel.type == R_ARM_CALL)
return false;
// ARM branch relocations can't use BLX
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
return rel.type != R_ARM_JUMP24 && rel.type != R_ARM_PC24 && rel.type != R_ARM_PLT32;
}
void ARMV7ABSLongThunk::writeLong(uint8_t *buf) {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0xe300c000); // movw ip,:lower16:S
write32(ctx, buf + 4, 0xe340c000); // movt ip,:upper16:S
write32(ctx, buf + 8, 0xe12fff1c); // bx ip
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
ctx.target->relocateNoSym(buf, R_ARM_MOVW_ABS_NC, s);
ctx.target->relocateNoSym(buf + 4, R_ARM_MOVT_ABS, s);
}
void ARMV7ABSLongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__ARMv7ABSLongThunk_" + destination.getName()),
STT_FUNC, 0, isec);
addSymbol("$a", STT_NOTYPE, 0, isec);
}
void ThumbV7ABSLongThunk::writeLong(uint8_t *buf) {
write16(ctx, buf + 0, 0xf240); // movw ip, :lower16:S
write16(ctx, buf + 2, 0x0c00);
write16(ctx, buf + 4, 0xf2c0); // movt ip, :upper16:S
write16(ctx, buf + 6, 0x0c00);
write16(ctx, buf + 8, 0x4760); // bx ip
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
ctx.target->relocateNoSym(buf, R_ARM_THM_MOVW_ABS_NC, s);
ctx.target->relocateNoSym(buf + 4, R_ARM_THM_MOVT_ABS, s);
}
void ThumbV7ABSLongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__Thumbv7ABSLongThunk_" + destination.getName()),
STT_FUNC, 1, isec);
addSymbol("$t", STT_NOTYPE, 0, isec);
}
void ARMV7PILongThunk::writeLong(uint8_t *buf) {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0,
0xe30fcff0); // P: movw ip,:lower16:S - (P + (L1-P) + 8)
write32(ctx, buf + 4,
0xe340c000); // movt ip,:upper16:S - (P + (L1-P) + 8)
write32(ctx, buf + 8, 0xe08cc00f); // L1: add ip, ip, pc
write32(ctx, buf + 12, 0xe12fff1c); // bx ip
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx);
int64_t offset = s - p - 16;
ctx.target->relocateNoSym(buf, R_ARM_MOVW_PREL_NC, offset);
ctx.target->relocateNoSym(buf + 4, R_ARM_MOVT_PREL, offset);
}
void ARMV7PILongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__ARMV7PILongThunk_" + destination.getName()),
STT_FUNC, 0, isec);
addSymbol("$a", STT_NOTYPE, 0, isec);
}
void ThumbV7PILongThunk::writeLong(uint8_t *buf) {
write16(ctx, buf + 0, 0xf64f); // P: movw ip,:lower16:S - (P + (L1-P) + 4)
write16(ctx, buf + 2, 0x7cf4);
write16(ctx, buf + 4, 0xf2c0); // movt ip,:upper16:S - (P + (L1-P) + 4)
write16(ctx, buf + 6, 0x0c00);
write16(ctx, buf + 8, 0x44fc); // L1: add ip, pc
write16(ctx, buf + 10, 0x4760); // bx ip
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx) & ~0x1;
int64_t offset = s - p - 12;
ctx.target->relocateNoSym(buf, R_ARM_THM_MOVW_PREL_NC, offset);
ctx.target->relocateNoSym(buf + 4, R_ARM_THM_MOVT_PREL, offset);
}
void ThumbV7PILongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__ThumbV7PILongThunk_" + destination.getName()),
STT_FUNC, 1, isec);
addSymbol("$t", STT_NOTYPE, 0, isec);
}
void ThumbV6MABSLongThunk::writeLong(uint8_t *buf) {
// Most Thumb instructions cannot access the high registers r8 - r15. As the
// only register we can corrupt is r12 we must instead spill a low register
// to the stack to use as a scratch register. We push r1 even though we
// don't need to get some space to use for the return address.
write16(ctx, buf + 0, 0xb403); // push {r0, r1} ; Obtain scratch registers
write16(ctx, buf + 2, 0x4801); // ldr r0, [pc, #4] ; L1
write16(ctx, buf + 4, 0x9001); // str r0, [sp, #4] ; SP + 4 = S
write16(ctx, buf + 6, 0xbd01); // pop {r0, pc} ; restore r0 and branch to dest
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 8, 0x00000000); // L1: .word S
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
ctx.target->relocateNoSym(buf + 8, R_ARM_ABS32, s);
}
void ThumbV6MABSLongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__Thumbv6MABSLongThunk_" + destination.getName()),
STT_FUNC, 1, isec);
addSymbol("$t", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void ThumbV6MABSLongThunk::addLongMapSyms() {
addSymbol("$d", STT_NOTYPE, 8, *tsec);
}
void ThumbV6MABSXOLongThunk::writeLong(uint8_t *buf) {
// Most Thumb instructions cannot access the high registers r8 - r15. As the
// only register we can corrupt is r12 we must instead spill a low register
// to the stack to use as a scratch register. We push r1 even though we
// don't need to get some space to use for the return address.
write16(ctx, buf + 0, 0xb403); // push {r0, r1} ; Obtain scratch registers
write16(ctx, buf + 2, 0x2000); // movs r0, :upper8_15:S
write16(ctx, buf + 4, 0x0200); // lsls r0, r0, #8
write16(ctx, buf + 6, 0x3000); // adds r0, :upper0_7:S
write16(ctx, buf + 8, 0x0200); // lsls r0, r0, #8
write16(ctx, buf + 10, 0x3000); // adds r0, :lower8_15:S
write16(ctx, buf + 12, 0x0200); // lsls r0, r0, #8
write16(ctx, buf + 14, 0x3000); // adds r0, :lower0_7:S
write16(ctx, buf + 16, 0x9001); // str r0, [sp, #4] ; SP + 4 = S
write16(ctx, buf + 18,
0xbd01); // pop {r0, pc} ; restore r0 and branch to dest
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
ctx.target->relocateNoSym(buf + 2, R_ARM_THM_ALU_ABS_G3, s);
ctx.target->relocateNoSym(buf + 6, R_ARM_THM_ALU_ABS_G2_NC, s);
ctx.target->relocateNoSym(buf + 10, R_ARM_THM_ALU_ABS_G1_NC, s);
ctx.target->relocateNoSym(buf + 14, R_ARM_THM_ALU_ABS_G0_NC, s);
}
void ThumbV6MABSXOLongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__Thumbv6MABSXOLongThunk_" + destination.getName()),
STT_FUNC, 1, isec);
addSymbol("$t", STT_NOTYPE, 0, isec);
}
void ThumbV6MPILongThunk::writeLong(uint8_t *buf) {
// Most Thumb instructions cannot access the high registers r8 - r15. As the
// only register we can corrupt is ip (r12) we must instead spill a low
// register to the stack to use as a scratch register.
write16(ctx, buf + 0,
0xb401); // P: push {r0} ; Obtain scratch register
write16(ctx, buf + 2, 0x4802); // ldr r0, [pc, #8] ; L2
write16(ctx, buf + 4, 0x4684); // mov ip, r0 ; high to low register
write16(ctx, buf + 6,
0xbc01); // pop {r0} ; restore scratch register
write16(ctx, buf + 8, 0x44e7); // L1: add pc, ip ; transfer control
write16(ctx, buf + 10,
0x46c0); // nop ; pad to 4-byte boundary
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 12, 0x00000000); // L2: .word S - (P + (L1 - P) + 4)
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx) & ~0x1;
ctx.target->relocateNoSym(buf + 12, R_ARM_REL32, s - p - 12);
}
void ThumbV6MPILongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__Thumbv6MPILongThunk_" + destination.getName()),
STT_FUNC, 1, isec);
addSymbol("$t", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void ThumbV6MPILongThunk::addLongMapSyms() {
addSymbol("$d", STT_NOTYPE, 12, *tsec);
}
void ARMV5LongLdrPcThunk::writeLong(uint8_t *buf) {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0xe51ff004); // ldr pc, [pc,#-4] ; L1
write32(ctx, buf + 4, 0x00000000); // L1: .word S
ctx.target->relocateNoSym(buf + 4, R_ARM_ABS32,
2024-10-07 23:29:11 -07:00
getARMThunkDestVA(ctx, destination));
}
void ARMV5LongLdrPcThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__ARMv5LongLdrPcThunk_" + destination.getName()),
STT_FUNC, 0, isec);
addSymbol("$a", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void ARMV5LongLdrPcThunk::addLongMapSyms() {
addSymbol("$d", STT_NOTYPE, 4, *tsec);
}
void ARMV4ABSLongBXThunk::writeLong(uint8_t *buf) {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0xe59fc000); // ldr r12, [pc] ; L1
write32(ctx, buf + 4, 0xe12fff1c); // bx r12
write32(ctx, buf + 8, 0x00000000); // L1: .word S
ctx.target->relocateNoSym(buf + 8, R_ARM_ABS32,
2024-10-07 23:29:11 -07:00
getARMThunkDestVA(ctx, destination));
}
void ARMV4ABSLongBXThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__ARMv4ABSLongBXThunk_" + destination.getName()),
STT_FUNC, 0, isec);
addSymbol("$a", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void ARMV4ABSLongBXThunk::addLongMapSyms() {
addSymbol("$d", STT_NOTYPE, 8, *tsec);
}
void ThumbV4ABSLongBXThunk::writeLong(uint8_t *buf) {
write16(ctx, buf + 0, 0x4778); // bx pc
write16(ctx, buf + 2,
0xe7fd); // b #-6 ; Arm recommended sequence to follow bx pc
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 4, 0xe51ff004); // ldr pc, [pc, #-4] ; L1
write32(ctx, buf + 8, 0x00000000); // L1: .word S
ctx.target->relocateNoSym(buf + 8, R_ARM_ABS32,
2024-10-07 23:29:11 -07:00
getARMThunkDestVA(ctx, destination));
}
void ThumbV4ABSLongBXThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__Thumbv4ABSLongBXThunk_" + destination.getName()),
STT_FUNC, 1, isec);
addSymbol("$t", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void ThumbV4ABSLongBXThunk::addLongMapSyms() {
addSymbol("$a", STT_NOTYPE, 4, *tsec);
addSymbol("$d", STT_NOTYPE, 8, *tsec);
}
void ThumbV4ABSLongThunk::writeLong(uint8_t *buf) {
write16(ctx, buf + 0, 0x4778); // bx pc
write16(ctx, buf + 2,
0xe7fd); // b #-6 ; Arm recommended sequence to follow bx pc
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 4, 0xe59fc000); // ldr r12, [pc] ; L1
write32(ctx, buf + 8, 0xe12fff1c); // bx r12
write32(ctx, buf + 12, 0x00000000); // L1: .word S
ctx.target->relocateNoSym(buf + 12, R_ARM_ABS32,
2024-10-07 23:29:11 -07:00
getARMThunkDestVA(ctx, destination));
}
void ThumbV4ABSLongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__Thumbv4ABSLongThunk_" + destination.getName()),
STT_FUNC, 1, isec);
addSymbol("$t", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void ThumbV4ABSLongThunk::addLongMapSyms() {
addSymbol("$a", STT_NOTYPE, 4, *tsec);
addSymbol("$d", STT_NOTYPE, 12, *tsec);
}
void ARMV4PILongBXThunk::writeLong(uint8_t *buf) {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0xe59fc004); // P: ldr ip, [pc,#4] ; L2
write32(ctx, buf + 4, 0xe08fc00c); // L1: add ip, pc, ip
write32(ctx, buf + 8, 0xe12fff1c); // bx ip
write32(ctx, buf + 12, 0x00000000); // L2: .word S - (P + (L1 - P) + 8)
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx) & ~0x1;
ctx.target->relocateNoSym(buf + 12, R_ARM_REL32, s - p - 12);
}
void ARMV4PILongBXThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__ARMv4PILongBXThunk_" + destination.getName()),
STT_FUNC, 0, isec);
addSymbol("$a", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void ARMV4PILongBXThunk::addLongMapSyms() {
addSymbol("$d", STT_NOTYPE, 12, *tsec);
}
void ARMV4PILongThunk::writeLong(uint8_t *buf) {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0xe59fc000); // P: ldr ip, [pc] ; L2
write32(ctx, buf + 4, 0xe08ff00c); // L1: add pc, pc, r12
write32(ctx, buf + 8, 0x00000000); // L2: .word S - (P + (L1 - P) + 8)
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx) & ~0x1;
ctx.target->relocateNoSym(buf + 8, R_ARM_REL32, s - p - 12);
}
void ARMV4PILongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__ARMv4PILongThunk_" + destination.getName()),
STT_FUNC, 0, isec);
addSymbol("$a", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void ARMV4PILongThunk::addLongMapSyms() {
addSymbol("$d", STT_NOTYPE, 8, *tsec);
}
void ThumbV4PILongBXThunk::writeLong(uint8_t *buf) {
write16(ctx, buf + 0, 0x4778); // P: bx pc
write16(ctx, buf + 2,
0xe7fd); // b #-6 ; Arm recommended sequence to follow bx pc
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 4, 0xe59fc000); // ldr r12, [pc] ; L2
write32(ctx, buf + 8, 0xe08cf00f); // L1: add pc, r12, pc
write32(ctx, buf + 12, 0x00000000); // L2: .word S - (P + (L1 - P) + 8)
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx) & ~0x1;
ctx.target->relocateNoSym(buf + 12, R_ARM_REL32, s - p - 16);
}
void ThumbV4PILongBXThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__Thumbv4PILongBXThunk_" + destination.getName()),
STT_FUNC, 1, isec);
addSymbol("$t", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void ThumbV4PILongBXThunk::addLongMapSyms() {
addSymbol("$a", STT_NOTYPE, 4, *tsec);
addSymbol("$d", STT_NOTYPE, 12, *tsec);
}
void ThumbV4PILongThunk::writeLong(uint8_t *buf) {
write16(ctx, buf + 0, 0x4778); // P: bx pc
write16(ctx, buf + 2,
0xe7fd); // b #-6 ; Arm recommended sequence to follow bx pc
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 4, 0xe59fc004); // ldr ip, [pc,#4] ; L2
write32(ctx, buf + 8, 0xe08fc00c); // L1: add ip, pc, ip
write32(ctx, buf + 12, 0xe12fff1c); // bx ip
write32(ctx, buf + 16, 0x00000000); // L2: .word S - (P + (L1 - P) + 8)
2024-10-07 23:29:11 -07:00
uint64_t s = getARMThunkDestVA(ctx, destination);
2024-10-19 20:32:58 -07:00
uint64_t p = getThunkTargetSym()->getVA(ctx) & ~0x1;
ctx.target->relocateNoSym(buf + 16, R_ARM_REL32, s - p - 16);
}
void ThumbV4PILongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__Thumbv4PILongThunk_" + destination.getName()),
STT_FUNC, 1, isec);
addSymbol("$t", STT_NOTYPE, 0, isec);
tsec = &isec;
(void)getMayUseShortThunk();
}
void ThumbV4PILongThunk::addLongMapSyms() {
addSymbol("$a", STT_NOTYPE, 4, *tsec);
addSymbol("$d", STT_NOTYPE, 16, *tsec);
}
// Use the long jump which covers a range up to 8MiB.
void AVRThunk::writeTo(uint8_t *buf) {
2024-10-13 10:37:47 -07:00
write32(ctx, buf, 0x940c); // jmp func
2024-10-19 20:32:58 -07:00
ctx.target->relocateNoSym(buf, R_AVR_CALL, destination.getVA(ctx));
}
void AVRThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__AVRThunk_" + destination.getName()), STT_FUNC, 0,
isec);
}
// Write MIPS LA25 thunk code to call PIC function from the non-PIC one.
void MipsThunk::writeTo(uint8_t *buf) {
2024-10-19 20:32:58 -07:00
uint64_t s = destination.getVA(ctx);
2024-10-13 10:37:47 -07:00
write32(ctx, buf, 0x3c190000); // lui $25, %hi(func)
write32(ctx, buf + 4, 0x08000000 | (s >> 2)); // j func
write32(ctx, buf + 8, 0x27390000); // addiu $25, $25, %lo(func)
write32(ctx, buf + 12, 0x00000000); // nop
ctx.target->relocateNoSym(buf, R_MIPS_HI16, s);
ctx.target->relocateNoSym(buf + 8, R_MIPS_LO16, s);
}
void MipsThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__LA25Thunk_" + destination.getName()), STT_FUNC, 0,
isec);
}
InputSection *MipsThunk::getTargetInputSection() const {
auto &dr = cast<Defined>(destination);
return dyn_cast<InputSection>(dr.section);
}
// Write microMIPS R2-R5 LA25 thunk code
// to call PIC function from the non-PIC one.
void MicroMipsThunk::writeTo(uint8_t *buf) {
2024-10-19 20:32:58 -07:00
uint64_t s = destination.getVA(ctx);
write16(ctx, buf, 0x41b9); // lui $25, %hi(func)
write16(ctx, buf + 4, 0xd400); // j func
write16(ctx, buf + 8, 0x3339); // addiu $25, $25, %lo(func)
write16(ctx, buf + 12, 0x0c00); // nop
ctx.target->relocateNoSym(buf, R_MICROMIPS_HI16, s);
ctx.target->relocateNoSym(buf + 4, R_MICROMIPS_26_S1, s);
ctx.target->relocateNoSym(buf + 8, R_MICROMIPS_LO16, s);
}
void MicroMipsThunk::addSymbols(ThunkSection &isec) {
Defined *d =
addSymbol(ctx.saver.save("__microLA25Thunk_" + destination.getName()),
STT_FUNC, 0, isec);
d->stOther |= STO_MIPS_MICROMIPS;
}
InputSection *MicroMipsThunk::getTargetInputSection() const {
auto &dr = cast<Defined>(destination);
return dyn_cast<InputSection>(dr.section);
}
// Write microMIPS R6 LA25 thunk code
// to call PIC function from the non-PIC one.
void MicroMipsR6Thunk::writeTo(uint8_t *buf) {
2024-10-19 20:32:58 -07:00
uint64_t s = destination.getVA(ctx);
uint64_t p = getThunkTargetSym()->getVA(ctx);
write16(ctx, buf, 0x1320); // lui $25, %hi(func)
write16(ctx, buf + 4, 0x3339); // addiu $25, $25, %lo(func)
write16(ctx, buf + 8, 0x9400); // bc func
ctx.target->relocateNoSym(buf, R_MICROMIPS_HI16, s);
ctx.target->relocateNoSym(buf + 4, R_MICROMIPS_LO16, s);
ctx.target->relocateNoSym(buf + 8, R_MICROMIPS_PC26_S1, s - p - 12);
}
void MicroMipsR6Thunk::addSymbols(ThunkSection &isec) {
Defined *d =
addSymbol(ctx.saver.save("__microLA25Thunk_" + destination.getName()),
STT_FUNC, 0, isec);
d->stOther |= STO_MIPS_MICROMIPS;
}
InputSection *MicroMipsR6Thunk::getTargetInputSection() const {
auto &dr = cast<Defined>(destination);
return dyn_cast<InputSection>(dr.section);
}
2024-09-29 15:20:01 -07:00
void elf::writePPC32PltCallStub(Ctx &ctx, uint8_t *buf, uint64_t gotPltVA,
const InputFile *file, int64_t addend) {
if (!ctx.arg.isPic) {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0x3d600000 | (gotPltVA + 0x8000) >> 16); // lis r11,ha
write32(ctx, buf + 4, 0x816b0000 | (uint16_t)gotPltVA); // lwz r11,l(r11)
write32(ctx, buf + 8, 0x7d6903a6); // mtctr r11
write32(ctx, buf + 12, 0x4e800420); // bctr
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
return;
}
uint32_t offset;
if (addend >= 0x8000) {
// The stub loads an address relative to r30 (.got2+Addend). Addend is
// almost always 0x8000. The address of .got2 is different in another object
// file, so a stub cannot be shared.
offset = gotPltVA -
(ctx.in.ppc32Got2->getParent()->getVA() +
(file->ppc32Got2 ? file->ppc32Got2->outSecOff : 0) + addend);
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
} else {
// The stub loads an address relative to _GLOBAL_OFFSET_TABLE_ (which is
// currently the address of .got).
offset = gotPltVA - ctx.in.got->getVA();
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
}
uint16_t ha = (offset + 0x8000) >> 16, l = (uint16_t)offset;
if (ha == 0) {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0x817e0000 | l); // lwz r11,l(r30)
write32(ctx, buf + 4, 0x7d6903a6); // mtctr r11
write32(ctx, buf + 8, 0x4e800420); // bctr
write32(ctx, buf + 12, 0x60000000); // nop
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
} else {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0x3d7e0000 | ha); // addis r11,r30,ha
write32(ctx, buf + 4, 0x816b0000 | l); // lwz r11,l(r11)
write32(ctx, buf + 8, 0x7d6903a6); // mtctr r11
write32(ctx, buf + 12, 0x4e800420); // bctr
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
}
}
void PPC32PltCallStub::writeTo(uint8_t *buf) {
2024-10-06 16:59:04 -07:00
writePPC32PltCallStub(ctx, buf, destination.getGotPltVA(ctx), file, addend);
}
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
void PPC32PltCallStub::addSymbols(ThunkSection &isec) {
std::string buf;
raw_string_ostream os(buf);
os << format_hex_no_prefix(addend, 8);
if (!ctx.arg.isPic)
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
os << ".plt_call32.";
else if (addend >= 0x8000)
os << ".got2.plt_pic32.";
else
os << ".plt_pic32.";
os << destination.getName();
addSymbol(ctx.saver.save(buf), STT_FUNC, 0, isec);
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
}
bool PPC32PltCallStub::isCompatibleWith(const InputSection &isec,
const Relocation &rel) const {
return !ctx.arg.isPic || (isec.file == file && rel.addend == addend);
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
}
void PPC32LongThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__LongThunk_" + destination.getName()), STT_FUNC, 0,
isec);
}
void PPC32LongThunk::writeTo(uint8_t *buf) {
auto ha = [](uint32_t v) -> uint16_t { return (v + 0x8000) >> 16; };
auto lo = [](uint32_t v) -> uint16_t { return v; };
2024-10-19 20:32:58 -07:00
uint32_t d = destination.getVA(ctx, addend);
if (ctx.arg.isPic) {
2024-10-19 20:32:58 -07:00
uint32_t off = d - (getThunkTargetSym()->getVA(ctx) + 8);
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0x7c0802a6); // mflr r12,0
write32(ctx, buf + 4, 0x429f0005); // bcl r20,r31,.+4
write32(ctx, buf + 8, 0x7d8802a6); // mtctr r12
write32(ctx, buf + 12, 0x3d8c0000 | ha(off)); // addis r12,r12,off@ha
write32(ctx, buf + 16, 0x398c0000 | lo(off)); // addi r12,r12,off@l
write32(ctx, buf + 20, 0x7c0803a6); // mtlr r0
buf += 24;
} else {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0x3d800000 | ha(d)); // lis r12,d@ha
write32(ctx, buf + 4, 0x398c0000 | lo(d)); // addi r12,r12,d@l
buf += 8;
}
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0x7d8903a6); // mtctr r12
write32(ctx, buf + 4, 0x4e800420); // bctr
}
2024-10-13 11:08:06 -07:00
void elf::writePPC64LoadAndBranch(Ctx &ctx, uint8_t *buf, int64_t offset) {
uint16_t offHa = (offset + 0x8000) >> 16;
uint16_t offLo = offset & 0xffff;
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0x3d820000 | offHa); // addis r12, r2, OffHa
write32(ctx, buf + 4, 0xe98c0000 | offLo); // ld r12, OffLo(r12)
write32(ctx, buf + 8, 0x7d8903a6); // mtctr r12
write32(ctx, buf + 12, 0x4e800420); // bctr
}
void PPC64PltCallStub::writeTo(uint8_t *buf) {
2024-10-06 16:59:04 -07:00
int64_t offset = destination.getGotPltVA(ctx) - getPPC64TocBase(ctx);
// Save the TOC pointer to the save-slot reserved in the call frame.
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0xf8410018); // std r2,24(r1)
2024-10-13 11:08:06 -07:00
writePPC64LoadAndBranch(ctx, buf + 4, offset);
}
void PPC64PltCallStub::addSymbols(ThunkSection &isec) {
Defined *s = addSymbol(ctx.saver.save("__plt_" + destination.getName()),
STT_FUNC, 0, isec);
[lld] Synthesize metadata for MTE globals As per the ABI at https://github.com/ARM-software/abi-aa/blob/main/memtagabielf64/memtagabielf64.rst, this patch interprets the SHT_AARCH64_MEMTAG_GLOBALS_STATIC section, which contains R_NONE relocations to tagged globals, and emits a SHT_AARCH64_MEMTAG_GLOBALS_DYNAMIC section, with the correct DT_AARCH64_MEMTAG_GLOBALS and DT_AARCH64_MEMTAG_GLOBALSSZ dynamic entries. This section describes, in a uleb-encoded stream, global memory ranges that should be tagged with MTE. We are also out of bits to spare in the LLD Symbol class. As a result, I've reused the 'needsTocRestore' bit, which is a PPC64 only feature. Now, it's also used for 'isTagged' on AArch64. An entry in SHT_AARCH64_MEMTAG_GLOBALS_STATIC is practically a guarantee from an objfile that all references to the linked symbol are through the GOT, and meet correct alignment requirements. As a result, we go through all symbols and make sure that, for all symbols $SYM, all object files that reference $SYM also have a SHT_AARCH64_MEMTAG_GLOBALS_STATIC entry for $SYM. If this isn't the case, we demote the symbol to being untagged. Symbols that are imported from other DSOs should always be fine, as they're GOT-referenced (and thus the GOT entry either has the correct tag or not, depending on whether it's tagged in the defining DSO or not). Additionally hand-tested by building {libc, libm, libc++, libm, and libnetd} on Android with some experimental MTE globals support in the linker/libc. Reviewed By: MaskRay, peter.smith Differential Revision: https://reviews.llvm.org/D152921
2023-07-31 17:07:26 +02:00
s->setNeedsTocRestore(true);
s->file = destination.file;
}
bool PPC64PltCallStub::isCompatibleWith(const InputSection &isec,
const Relocation &rel) const {
return rel.type == R_PPC64_REL24 || rel.type == R_PPC64_REL14;
}
void PPC64R2SaveStub::writeTo(uint8_t *buf) {
const int64_t offset = computeOffset();
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0xf8410018); // std r2,24(r1)
// The branch offset needs to fit in 26 bits.
if (getMayUseShortThunk()) {
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 4, 0x48000000 | (offset & 0x03fffffc)); // b <offset>
} else if (isInt<34>(offset)) {
int nextInstOffset;
2024-10-19 20:32:58 -07:00
uint64_t tocOffset = destination.getVA(ctx) - getPPC64TocBase(ctx);
if (tocOffset >> 16 > 0) {
const uint64_t addi = ADDI_R12_TO_R12_NO_DISP | (tocOffset & 0xffff);
const uint64_t addis =
ADDIS_R12_TO_R2_NO_DISP | ((tocOffset >> 16) & 0xffff);
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 4, addis); // addis r12, r2 , top of offset
write32(ctx, buf + 8, addi); // addi r12, r12, bottom of offset
nextInstOffset = 12;
} else {
const uint64_t addi = ADDI_R12_TO_R2_NO_DISP | (tocOffset & 0xffff);
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 4, addi); // addi r12, r2, offset
nextInstOffset = 8;
}
2024-10-13 10:37:47 -07:00
write32(ctx, buf + nextInstOffset, MTCTR_R12); // mtctr r12
write32(ctx, buf + nextInstOffset + 4, BCTR); // bctr
} else {
ctx.in.ppc64LongBranchTarget->addEntry(&destination, addend);
const int64_t offsetFromTOC =
ctx.in.ppc64LongBranchTarget->getEntryVA(&destination, addend) -
2024-10-06 00:14:12 -07:00
getPPC64TocBase(ctx);
2024-10-13 11:08:06 -07:00
writePPC64LoadAndBranch(ctx, buf + 4, offsetFromTOC);
}
}
void PPC64R2SaveStub::addSymbols(ThunkSection &isec) {
Defined *s = addSymbol(ctx.saver.save("__toc_save_" + destination.getName()),
STT_FUNC, 0, isec);
[lld] Synthesize metadata for MTE globals As per the ABI at https://github.com/ARM-software/abi-aa/blob/main/memtagabielf64/memtagabielf64.rst, this patch interprets the SHT_AARCH64_MEMTAG_GLOBALS_STATIC section, which contains R_NONE relocations to tagged globals, and emits a SHT_AARCH64_MEMTAG_GLOBALS_DYNAMIC section, with the correct DT_AARCH64_MEMTAG_GLOBALS and DT_AARCH64_MEMTAG_GLOBALSSZ dynamic entries. This section describes, in a uleb-encoded stream, global memory ranges that should be tagged with MTE. We are also out of bits to spare in the LLD Symbol class. As a result, I've reused the 'needsTocRestore' bit, which is a PPC64 only feature. Now, it's also used for 'isTagged' on AArch64. An entry in SHT_AARCH64_MEMTAG_GLOBALS_STATIC is practically a guarantee from an objfile that all references to the linked symbol are through the GOT, and meet correct alignment requirements. As a result, we go through all symbols and make sure that, for all symbols $SYM, all object files that reference $SYM also have a SHT_AARCH64_MEMTAG_GLOBALS_STATIC entry for $SYM. If this isn't the case, we demote the symbol to being untagged. Symbols that are imported from other DSOs should always be fine, as they're GOT-referenced (and thus the GOT entry either has the correct tag or not, depending on whether it's tagged in the defining DSO or not). Additionally hand-tested by building {libc, libm, libc++, libm, and libnetd} on Android with some experimental MTE globals support in the linker/libc. Reviewed By: MaskRay, peter.smith Differential Revision: https://reviews.llvm.org/D152921
2023-07-31 17:07:26 +02:00
s->setNeedsTocRestore(true);
}
bool PPC64R2SaveStub::isCompatibleWith(const InputSection &isec,
const Relocation &rel) const {
return rel.type == R_PPC64_REL24 || rel.type == R_PPC64_REL14;
}
void PPC64R12SetupStub::writeTo(uint8_t *buf) {
2024-10-06 16:59:04 -07:00
int64_t offset =
2024-10-19 20:32:58 -07:00
(gotPlt ? destination.getGotPltVA(ctx) : destination.getVA(ctx)) -
getThunkTargetSym()->getVA(ctx);
if (!isInt<34>(offset))
2024-09-28 19:17:18 -07:00
reportRangeError(ctx, buf, offset, 34, destination,
"R12 setup stub offset");
int nextInstOffset;
if (ctx.arg.power10Stubs) {
const uint64_t imm = (((offset >> 16) & 0x3ffff) << 32) | (offset & 0xffff);
// pld 12, func@plt@pcrel or paddi r12, 0, func@pcrel
2024-10-06 00:14:12 -07:00
writePrefixedInst(ctx, buf,
(gotPlt ? PLD_R12_NO_DISP : PADDI_R12_NO_DISP) | imm);
nextInstOffset = 8;
} else {
uint32_t off = offset - 8;
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 0, 0x7d8802a6); // mflr 12
write32(ctx, buf + 4, 0x429f0005); // bcl 20,31,.+4
write32(ctx, buf + 8, 0x7d6802a6); // mflr 11
write32(ctx, buf + 12, 0x7d8803a6); // mtlr 12
write32(ctx, buf + 16,
0x3d8b0000 | ((off + 0x8000) >> 16)); // addis 12,11,off@ha
if (gotPlt)
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 20, 0xe98c0000 | (off & 0xffff)); // ld 12, off@l(12)
else
2024-10-13 10:37:47 -07:00
write32(ctx, buf + 20, 0x398c0000 | (off & 0xffff)); // addi 12,12,off@l
nextInstOffset = 24;
}
2024-10-13 10:37:47 -07:00
write32(ctx, buf + nextInstOffset, MTCTR_R12); // mtctr r12
write32(ctx, buf + nextInstOffset + 4, BCTR); // bctr
}
void PPC64R12SetupStub::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save((gotPlt ? "__plt_pcrel_" : "__gep_setup_") +
destination.getName()),
STT_FUNC, 0, isec);
}
bool PPC64R12SetupStub::isCompatibleWith(const InputSection &isec,
const Relocation &rel) const {
return rel.type == R_PPC64_REL24_NOTOC;
}
void PPC64LongBranchThunk::writeTo(uint8_t *buf) {
int64_t offset =
ctx.in.ppc64LongBranchTarget->getEntryVA(&destination, addend) -
2024-10-06 00:14:12 -07:00
getPPC64TocBase(ctx);
2024-10-13 11:08:06 -07:00
writePPC64LoadAndBranch(ctx, buf, offset);
}
void PPC64LongBranchThunk::addSymbols(ThunkSection &isec) {
addSymbol(ctx.saver.save("__long_branch_" + destination.getName()), STT_FUNC,
0, isec);
}
bool PPC64LongBranchThunk::isCompatibleWith(const InputSection &isec,
const Relocation &rel) const {
return rel.type == R_PPC64_REL24 || rel.type == R_PPC64_REL14;
}
2024-09-29 15:20:01 -07:00
Thunk::Thunk(Ctx &ctx, Symbol &d, int64_t a)
: ctx(ctx), destination(d), addend(a), offset(0) {
destination.thunkAccessed = true;
}
Thunk::~Thunk() = default;
static std::unique_ptr<Thunk> addThunkAArch64(Ctx &ctx, RelType type, Symbol &s,
int64_t a) {
assert(is_contained({R_AARCH64_CALL26, R_AARCH64_JUMP26, R_AARCH64_PLT32},
type));
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
bool mayNeedLandingPad =
(ctx.arg.andFeatures & GNU_PROPERTY_AARCH64_FEATURE_1_BTI) &&
2024-10-07 23:29:11 -07:00
!isAArch64BTILandingPad(ctx, s, a);
if (ctx.arg.picThunk)
return std::make_unique<AArch64ADRPThunk>(ctx, s, a, mayNeedLandingPad);
return std::make_unique<AArch64ABSLongThunk>(ctx, s, a, mayNeedLandingPad);
}
// Creates a thunk for long branches or Thumb-ARM interworking.
// Arm Architectures v4t does not support Thumb2 technology, and does not
// support BLX or LDR Arm/Thumb state switching. This means that
// - MOVT and MOVW instructions cannot be used.
// - We can't rewrite BL in place to BLX. We will need thunks.
//
// TODO: use B for short Thumb->Arm thunks instead of LDR (this doesn't work for
// Arm->Thumb, as in Arm state no BX PC trick; it doesn't switch state).
static std::unique_ptr<Thunk> addThunkArmv4(Ctx &ctx, RelType reloc, Symbol &s,
int64_t a) {
2024-10-19 20:32:58 -07:00
bool thumb_target = s.getVA(ctx, a) & 1;
switch (reloc) {
case R_ARM_PC24:
case R_ARM_PLT32:
case R_ARM_JUMP24:
case R_ARM_CALL:
if (ctx.arg.picThunk) {
if (thumb_target)
return std::make_unique<ARMV4PILongBXThunk>(ctx, s, a);
return std::make_unique<ARMV4PILongThunk>(ctx, s, a);
}
if (thumb_target)
return std::make_unique<ARMV4ABSLongBXThunk>(ctx, s, a);
return std::make_unique<ARMV5LongLdrPcThunk>(ctx, s, a);
case R_ARM_THM_CALL:
if (ctx.arg.picThunk) {
if (thumb_target)
return std::make_unique<ThumbV4PILongThunk>(ctx, s, a);
return std::make_unique<ThumbV4PILongBXThunk>(ctx, s, a);
}
if (thumb_target)
return std::make_unique<ThumbV4ABSLongThunk>(ctx, s, a);
return std::make_unique<ThumbV4ABSLongBXThunk>(ctx, s, a);
}
Fatal(ctx) << "relocation " << reloc << " to " << &s
<< " not supported for Armv4 or Armv4T target";
llvm_unreachable("");
}
// Creates a thunk for Thumb-ARM interworking compatible with Armv5 and Armv6.
// Arm Architectures v5 and v6 do not support Thumb2 technology. This means that
// - MOVT and MOVW instructions cannot be used
// - Only Thumb relocation that can generate a Thunk is a BL, this can always
// be transformed into a BLX
static std::unique_ptr<Thunk> addThunkArmv5v6(Ctx &ctx, RelType reloc,
Symbol &s, int64_t a) {
switch (reloc) {
case R_ARM_PC24:
case R_ARM_PLT32:
case R_ARM_JUMP24:
case R_ARM_CALL:
case R_ARM_THM_CALL:
if (ctx.arg.picThunk)
return std::make_unique<ARMV4PILongBXThunk>(ctx, s, a);
return std::make_unique<ARMV5LongLdrPcThunk>(ctx, s, a);
}
Fatal(ctx) << "relocation " << reloc << " to " << &s
<< " not supported for Armv5 or Armv6 targets";
llvm_unreachable("");
}
// Create a thunk for Thumb long branch on V6-M.
// Arm Architecture v6-M only supports Thumb instructions. This means
// - MOVT and MOVW instructions cannot be used.
// - Only a limited number of instructions can access registers r8 and above
// - No interworking support is needed (all Thumb).
static std::unique_ptr<Thunk> addThunkV6M(Ctx &ctx, const InputSection &isec,
RelType reloc, Symbol &s, int64_t a) {
const bool isPureCode = isec.getParent()->flags & SHF_ARM_PURECODE;
switch (reloc) {
case R_ARM_THM_JUMP19:
case R_ARM_THM_JUMP24:
case R_ARM_THM_CALL:
if (ctx.arg.isPic) {
if (!isPureCode)
return std::make_unique<ThumbV6MPILongThunk>(ctx, s, a);
Fatal(ctx)
<< "relocation " << reloc << " to " << &s
<< " not supported for Armv6-M targets for position independent"
" and execute only code";
llvm_unreachable("");
}
if (isPureCode)
return std::make_unique<ThumbV6MABSXOLongThunk>(ctx, s, a);
return std::make_unique<ThumbV6MABSLongThunk>(ctx, s, a);
}
Fatal(ctx) << "relocation " << reloc << " to " << &s
<< " not supported for Armv6-M targets";
llvm_unreachable("");
}
// Creates a thunk for Thumb-ARM interworking or branch range extension.
static std::unique_ptr<Thunk> addThunkArm(Ctx &ctx, const InputSection &isec,
RelType reloc, Symbol &s, int64_t a) {
// Decide which Thunk is needed based on:
// Available instruction set
// - An Arm Thunk can only be used if Arm state is available.
// - A Thumb Thunk can only be used if Thumb state is available.
// - Can only use a Thunk if it uses instructions that the Target supports.
// Relocation is branch or branch and link
// - Branch instructions cannot change state, can only select Thunk that
// starts in the same state as the caller.
// - Branch and link relocations can change state, can select Thunks from
// either Arm or Thumb.
// Position independent Thunks if we require position independent code.
// Execute Only Thunks if the output section is execute only code.
// Handle architectures that have restrictions on the instructions that they
// can use in Thunks. The flags below are set by reading the BuildAttributes
// of the input objects. InputFiles.cpp contains the mapping from ARM
// architecture to flag.
if (!ctx.arg.armHasMovtMovw) {
if (ctx.arg.armJ1J2BranchEncoding)
2024-09-29 15:20:01 -07:00
return addThunkV6M(ctx, isec, reloc, s, a);
if (ctx.arg.armHasBlx)
2024-09-29 15:20:01 -07:00
return addThunkArmv5v6(ctx, reloc, s, a);
return addThunkArmv4(ctx, reloc, s, a);
}
switch (reloc) {
case R_ARM_PC24:
case R_ARM_PLT32:
case R_ARM_JUMP24:
case R_ARM_CALL:
if (ctx.arg.picThunk)
return std::make_unique<ARMV7PILongThunk>(ctx, s, a);
return std::make_unique<ARMV7ABSLongThunk>(ctx, s, a);
case R_ARM_THM_JUMP19:
case R_ARM_THM_JUMP24:
case R_ARM_THM_CALL:
if (ctx.arg.picThunk)
return std::make_unique<ThumbV7PILongThunk>(ctx, s, a);
return std::make_unique<ThumbV7ABSLongThunk>(ctx, s, a);
}
llvm_unreachable("");
}
static std::unique_ptr<Thunk> addThunkAVR(Ctx &ctx, RelType type, Symbol &s,
int64_t a) {
switch (type) {
case R_AVR_LO8_LDI_GS:
case R_AVR_HI8_LDI_GS:
return std::make_unique<AVRThunk>(ctx, s, a);
default:
llvm_unreachable("");
}
}
static std::unique_ptr<Thunk> addThunkMips(Ctx &ctx, RelType type, Symbol &s) {
2024-10-06 00:14:12 -07:00
if ((s.stOther & STO_MIPS_MICROMIPS) && isMipsR6(ctx))
return std::make_unique<MicroMipsR6Thunk>(ctx, s);
if (s.stOther & STO_MIPS_MICROMIPS)
return std::make_unique<MicroMipsThunk>(ctx, s);
return std::make_unique<MipsThunk>(ctx, s);
}
static std::unique_ptr<Thunk> addThunkPPC32(Ctx &ctx, const InputSection &isec,
const Relocation &rel, Symbol &s) {
assert((rel.type == R_PPC_LOCAL24PC || rel.type == R_PPC_REL24 ||
rel.type == R_PPC_PLTREL24) &&
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
"unexpected relocation type for thunk");
2024-10-06 16:59:04 -07:00
if (s.isInPlt(ctx))
return std::make_unique<PPC32PltCallStub>(ctx, isec, rel, s);
return std::make_unique<PPC32LongThunk>(ctx, s, rel.addend);
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
}
static std::unique_ptr<Thunk> addThunkPPC64(Ctx &ctx, RelType type, Symbol &s,
int64_t a) {
assert((type == R_PPC64_REL14 || type == R_PPC64_REL24 ||
type == R_PPC64_REL24_NOTOC) &&
"unexpected relocation type for thunk");
// If we are emitting stubs for NOTOC relocations, we need to tell
// the PLT resolver that there can be multiple TOCs.
if (type == R_PPC64_REL24_NOTOC)
ctx.target->ppc64DynamicSectionOpt = 0x2;
if (s.isInPlt(ctx)) {
if (type == R_PPC64_REL24_NOTOC)
return std::make_unique<PPC64R12SetupStub>(ctx, s,
/*gotPlt=*/true);
return std::make_unique<PPC64PltCallStub>(ctx, s);
}
// This check looks at the st_other bits of the callee. If the value is 1
// then the callee clobbers the TOC and we need an R2 save stub when RelType
// is R_PPC64_REL14 or R_PPC64_REL24.
if ((type == R_PPC64_REL14 || type == R_PPC64_REL24) && (s.stOther >> 5) == 1)
return std::make_unique<PPC64R2SaveStub>(ctx, s, a);
if (type == R_PPC64_REL24_NOTOC)
return std::make_unique<PPC64R12SetupStub>(ctx, s, /*gotPlt=*/false);
if (ctx.arg.picThunk)
return std::make_unique<PPC64PILongBranchThunk>(ctx, s, a);
return std::make_unique<PPC64PDLongBranchThunk>(ctx, s, a);
}
std::unique_ptr<Thunk> elf::addThunk(Ctx &ctx, const InputSection &isec,
Relocation &rel) {
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
Symbol &s = *rel.sym;
int64_t a = rel.addend;
[PPC32] Improve the 32-bit PowerPC port Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16*, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. * Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
2019-06-06 17:03:00 +00:00
switch (ctx.arg.emachine) {
case EM_AARCH64:
2024-09-29 14:59:57 -07:00
return addThunkAArch64(ctx, rel.type, s, a);
case EM_ARM:
2024-09-29 14:59:57 -07:00
return addThunkArm(ctx, isec, rel.type, s, a);
case EM_AVR:
2024-09-29 15:20:01 -07:00
return addThunkAVR(ctx, rel.type, s, a);
case EM_MIPS:
2024-09-29 15:20:01 -07:00
return addThunkMips(ctx, rel.type, s);
case EM_PPC:
2024-09-29 15:20:01 -07:00
return addThunkPPC32(ctx, isec, rel, s);
case EM_PPC64:
2024-09-29 14:59:57 -07:00
return addThunkPPC64(ctx, rel.type, s, a);
default:
llvm_unreachable("add Thunk only supported for ARM, AVR, Mips and PowerPC");
}
}
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
std::unique_ptr<Thunk> elf::addLandingPadThunk(Ctx &ctx, Symbol &s, int64_t a) {
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
switch (ctx.arg.emachine) {
case EM_AARCH64:
return std::make_unique<AArch64BTILandingPadThunk>(ctx, s, a);
[LLD][ELF][AArch64] Add BTI Aware long branch thunks (#108989) When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] https://github.com/ARM-software/abi-aa/issues/196
2024-10-01 13:12:29 +01:00
default:
llvm_unreachable("add landing pad only supported for AArch64");
}
}