Add the required IMAGE_REL_MIPS_PAIR relocation after
IMAGE_REL_MIPS_REFHI/IMAGE_REL_MIPS_SECRELHI
Microsoft PE/COFF specification says that the IMAGE_REL_MIPS_REFHI
relocation contains "the high 16 bits of the target's 32-bit virtual
address. [...] This relocation must be immediately followed by a PAIR
relocation whose SymbolTableIndex contains a 16-bit displacement which
is added to the upper 16 bits taken from the location being relocated."
Microsoft PE/COFF specification says that the IMAGE_REL_MIPS_SECRELHI
relocation contains "the high 16 bits of the 32-bit offset of the target
from the beginning of its section. A PAIR relocation must immediately
follow this on. The SymbolTableIndex of the PAIR relocation contains a
16-bit displacement, which is added to the upper 16 bits taken from the
location being relocated."
Behavior has been checked against Microsoft C compiler for MIPS.
The ASAN and MSAN tests have been failing after #122777 because some
fields are now set in `executePostLayoutBinding` which is skipped by the
assembler if it had errors but read in `writeObject`
Since the compilation has failed anyway, skip `writeObject` if the
assembler had errors.
This change implements import call optimization for AArch64 Windows
(equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag).
Import call optimization adds additional data to the binary which can be
used by the Windows kernel loader to rewrite indirect calls to imported
functions as direct calls. It uses the same [Dynamic Value Relocation
Table mechanism that was leveraged on x64 to implement
`/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618).
The change to the obj file is to add a new `.impcall` section with the
following layout:
```cpp
// Per section that contains calls to imported functions:
// uint32_t SectionSize: Size in bytes for information in this section.
// uint32_t Section Number
// Per call to imported function in section:
// uint32_t Kind: the kind of imported function.
// uint32_t BranchOffset: the offset of the branch instruction in its
// parent section.
// uint32_t TargetSymbolId: the symbol id of the called function.
```
NOTE: If the import call optimization feature is enabled, then the
`.impcall` section must be emitted, even if there are no calls to
imported functions.
The implementation is split across a few parts of LLVM:
* During AArch64 instruction selection, the `GlobalValue` for each call
to a global is recorded into the Extra Information for that node.
* During lowering to machine instructions, the called global value for
each call is noted in its containing `MachineFunction`.
* During AArch64 asm printing, if the import call optimization feature
is enabled:
- A (new) `.impcall` directive is emitted for each call to an imported
function.
- The `.impcall` section is emitted with its magic header (but is not
filled in).
* During COFF object writing, the `.impcall` section is filled in based
on each `.impcall` directive that were encountered.
The `.impcall` section can only be filled in when we are writing the
COFF object as it requires the actual section numbers, which are only
assigned at that point (i.e., they don't exist during asm printing).
I had tried to avoid using the Extra Information during instruction
selection and instead implement this either purely during asm printing
or in a `MachineFunctionPass` (as suggested in [on the
forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3))
but this was not possible due to how loading and calling an imported
function works on AArch64. Specifically, they are emitted as `ADRP` +
`LDR` (to load the symbol) then a `BR` (to do the call), so at the point
when we have machine instructions, we would have to work backwards
through the instructions to discover what is being called. An initial
prototype did work by inspecting instructions; however, it didn't
correctly handle the case where the same function was called twice in a
row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the
previously loaded address. Worse than that, sometimes for the
double-call case LLVM decided to spill the loaded address to the stack
and then reload it before making the second call. So, instead of trying
to implement logic to discover where the value in a register came from,
I instead recorded the symbol being called at the last place where it
was easy to do: instruction selection.
Similar to commit 28fcafb50274be2520117eacb0a886adafefe59d (2011) for
MachObjectWriter. MCWinCOFFStreamer can now access WinCOFFObjectWriter
directly without holding object file format specific inforamtion in
MCAssembler (e.g. IncrementalLinkerCompatible).
Follow-up to 10c894cffd0f4bef21b54a43b5780240532e44cf.
MCAsmLayout, introduced by ac8a95498a99eb16dff9d3d0186616645d200b6e
(2010), provides APIs to compute fragment/symbol/section offsets.
The separate class is cumbersome and passing it around has overhead.
Let's remove it as the underlying implementation is tightly coupled with
MCAsmLayout anyway.
Some forwarders are added to ease migration.
13a79bbfe583e1d8cc85d241b580907260065eb8 (2017) unified `BeginSymbol` and
section symbol for ELF. This patch does the same for COFF.
* In getCOFFSection, all sections now have a `BeginSymbol` (section
symbol). We do not need a dummy symbol name when `getBeginSymbol` is
needed (used by AsmParser::Run and DWARF generation).
* Section symbols are in the global symbol table. `call .text` will
reference the section symbol instead of an undefined symbol. This
matches GNU assembler. Unlike GNU, redefining the section symbol will
cause a "symbol 'foo0' is already defined" error (see
`section-sym-err.s`).
Pull Request: https://github.com/llvm/llvm-project/pull/96459
All sections should have been created before MCAssembler::layout so that
every section has an ordinal. Registering the section in
WinCOFFObjectWriter::executePostLayoutBinding is a hack.
There are only three actual uses of the section kind in MCSection:
isText(), XCOFF, and WebAssembly. Store isText() in the MCSection, and
store other info in the actual section variants where required.
ELF and COFF flags also encode all relevant information, so for these
two section variants, remove the SectionKind parameter entirely.
This allows to remove the string switch (which is unnecessary and
inaccurate) from createELFSectionImpl. This was introduced in
[D133456](https://reviews.llvm.org/D133456), but apparently, it was
never hit for non-writable sections anyway and the resulting kind was
never used.
`allocFragment` might be changed to a placement new when the allocation
strategy changes.
`allocInitialFragment` is to deduplicate the following pattern
```
auto *F = new MCDataFragment();
Result->addFragment(*F);
F->setParent(Result);
```
Pull Request: https://github.com/llvm/llvm-project/pull/95197
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
For a b/bl instruction that branches a temporary symbol (private
assembly label), an IMAGE_REL_ARM64_BRANCH26 relocation to another
symbol + a non-zero offset could be emitted but the linkers don't
support this type of relocation and cause incorrect relocations and
crashes. Avoid emitting this type of relocations.
Differential Revision: https://reviews.llvm.org/D155732
D152340 has split WinCOFFObjectWriter to WinCOFFWriter. This patch adds
another WinCOFFWriter as DwoWriter to write Dwo sections to dwo file.
Driver options are also updated accordingly to support -gsplit-dwarf in
CL mode.
e.g. $ clang-cl -c -gdwarf -gsplit-dwarf foo.c
Like what -gsplit-dwarf did in ELF, using this option will create DWARF object
(.dwo) file. DWARF debug info is split between COFF object and DWARF object
file. It can reduce the executable file size especially for large project.
Reviewed By: skan, MaskRay
Differential Revision: https://reviews.llvm.org/D152785
We'd like to support -gsplit-dwarf for Windows COFF. It requires to
write Dwo and NonDwo sections to different output streams.The original
implementation is not designed to do that and there can be only one
MCObjectWriter. This patch split WinCOFFObjectWriter to WinCOFFWriter so
that:
1. WinCOFFObjectWriter can create multiple WinCOFFWriter.
2. Each WinCOFFWriter can separately collect sections it is interested.
3. Each WinCOFFWriter can write to it's own output stream.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D152340
This is mostly useful for ARM64EC, which uses such symbols extensively.
One interesting quirk of ARM64EC is that we need to be able to emit weak
symbols that point at each other (so if either symbol is defined
elsewhere, both symbols point at the definition). This handling is
currently restricted to weak_anti_dep symbols, because we depend on the
current behavior of resolving weak symbols in some cases.
Differential Revision: https://reviews.llvm.org/D145208
Set public specifiers only for constructor and inherited methods from
MCObjectWriter and leave others as private. Also change the order of
MCObjectWriter methods' definition according to it's declaration order.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D152229
Each COFFSection bind MCSection when created. No need to iterate
throught MCAssembler when writeSection.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D151793
This reverts commit 10c17c97ebaf81ac26f6830e51a7a57ddcf63cd2. It causes undefined symbol error on chromium windows build. A small repro was uploaded to the code review.
This is mostly useful for ARM64EC, which uses such symbols extensively.
One interesting quirk of ARM64EC is that we need to be able to emit weak
symbols that point at each other (so if either symbol is defined
elsewhere, both symbols point at the definition). This required a few
changes to the way we handle weak symbols on Windows.
Differential Revision: https://reviews.llvm.org/D145208
This is mostly useful for ARM64EC, which uses such symbols extensively.
One interesting quirk of ARM64EC is that we need to be able to emit weak
symbols that point at each other (so if either symbol is defined
elsewhere, both symbols point at the definition). This required a few
changes to the way we handle weak symbols on Windows.
Differential Revision: https://reviews.llvm.org/D145208
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
.addrsig_sym forces registering the symbol regardless whether it is otherwise
registered. This creates an undefined symbol which is inconvenient/undesired:
* `extern int x; void f() { (void)x; }` has inconsistent behavior whether `x` is emitted as an undefined symbol.
`-O0 -faddrsig` makes `x` undefined while other -O levels and -fno-addrsig eliminate the symbol.
* In ThinLTO, after a non-prevailing linkonce_odr definition is converted to available_externally, and then a declaration,
the addrsig code emits a symbol while the symbol is otherwise unseen.
D135427 fixed a bug that a non-prevailing `__cxx_global_var_init` was
incorrectly retained. However, the IR declaration causes an undesired
`.addrsig_sym __cxx_global_var_init`. This can be addressed in a way similar
to D101512 (`isTransitiveUsedByMetadataOnly`) but the increased
`OutStreamer->emitAddrsigSym(getSymbol(&GV));` complexity makes me nervous.
Just ignoring unregistered symbols circumvents the problem.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D135642
When using weak symbols, the WinCOFFObjectWriter keeps a list (`WeakDefaults`)
that's used to make names unique. This list should be reset when the object
writer is reset, because otherwise reuse of the object writer can result in
freed symbols being accessed. With some added output, this becomes clear when
using `llc` in `--run-twice` mode:
```
$ ./llc --compile-twice -mtriple=x86_64-pc-win32 trivial.ll -filetype=obj
DefineSymbol::WeakDefaults
- .weak.foo.default
- .weak.bar.default
DefineSymbol::WeakDefaults
- .weak.foo.default
- áÑJij⌂ p§┼Ø┐☺
- .debug_macinfo.dw
- .weak.bar.default
```
This does not seem to leak into the output object file though, so I couldn't
come up with a test. I added one that just does `--run-twice` (and verified
that it does access freed memory), which should result in detecting the
invalid memory accesses when running under ASAN.
Observed in a Julia PR where we started using weak symbols:
https://github.com/JuliaLang/julia/pull/45649
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D129840
It's a fairly common issue that the generating code incorrectly marks
instructions as narrow or wide; check that the instruction lengths
add up to the expected value, and error out if it doesn't. This allows
catching code generation bugs.
Also check that prologs and epilogs are properly terminated, to
catch other code generation issues.
Differential Revision: https://reviews.llvm.org/D125647