Reland https://github.com/llvm/llvm-project/pull/106230
The original PR was reverted due to compilation time regression.
This PR fixed that by adding a condition OutStreamer->isVerboseAsm() to
the generation of extra inlined-at debug info, so that it does not
affect normal compilation time.
Currently MC print source location of instructions in comments in
assembly when debug info is available, however, it does not include
inlined-at locations when a function is inlined.
For example, function foo is defined in header file a.h and is called
multiple times in b.cpp. If foo is inlined, current assembly will only
show its instructions with their line numbers in a.h. With inlined-at
locations, the assembly will also show where foo is called in b.cpp.
This patch adds inlined-at locations to the comments by using
DebugLoc::print. It makes the printed source location info consistent
with those printed by machine passes.
Currently MC print source location of instructions in comments in
assembly when debug info is available, however, it does not include
inlined-at locations when a function is inlined.
For example, function foo is defined in header file a.h and is called
multiple times in b.cpp. If foo is inlined, current assembly will only
show its instructions with their line numbers in a.h. With inlined-at
locations, the assembly will also show where foo is called in b.cpp.
This patch adds inlined-at locations to the comments by using
DebugLoc::print. It makes the printed source location info consistent
with those printed by machine passes.
https://reviews.llvm.org/D23669 inappropriately added MIPS-specific
dtprel/tprel directives to MCStreamer. In addition,
llvm-mc -filetype=null parsing these directives will crash.
This patch moves these functions to MipsTargetStreamer and fixes
-filetype=null.
gprel32 and gprel64, called by AsmPrinter, are moved to
MCTargetStreamer.
Avoid needless copying of instructions and fixups and directly emit into
the fragment small vectors.
This (optionally, second commit) also removes the single use of the
MCCompactEncodedInstFragment to simplify code.
https://reviews.llvm.org/D70157 (for Intel Jump Conditional Code
Erratum) introduced two virtual function calls in the generic
MCObjectStreamer::emitInstruction, which added some overhead.
This patch removes the virtual function overhead:
* Define `llvm::X86_MC::emitInstruction` that calls `emitInstruction{Begin,End}`.
* Define {X86ELFStreamer,X86WinCOFFStreamer}::emitInstruction to call `llvm::X86_MC::emitInstruction`
Pull Request: https://github.com/llvm/llvm-project/pull/96835
Follow-up to 05ba5c0648ae5e80d5afce270495bf3b1eef9af4. uint32_t is
preferred over const MCExpr * in the section stack uses because it
should only be evaluated once. Change the paramter type to match.
This commit removes the complexity introduced by pending labels in
https://reviews.llvm.org/D5915 by using a simpler approach. D5915 aimed
to ensure padding placement before `.Ltmp0` for the following code, but
at the cost of expensive per-instruction `flushPendingLabels`.
```
// similar to llvm/test/MC/X86/AlignedBundling/labeloffset.s
.bundle_lock align_to_end
calll .L0$pb
.bundle_unlock
.L0$pb:
popl %eax
.Ltmp0: //// padding should be inserted before this label instead of after
addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax
```
(D5915 was adjusted by https://reviews.llvm.org/D8072 and
https://reviews.llvm.org/D71368)
This patch achieves the same goal by setting the offset of the empty
MCDataFragment (`Prev`) in `layoutBundle`. This eliminates the need for
pending labels and simplifies the code.
llvm/test/MC/MachO/pending-labels.s (D71368): relocation symbols are
changed, but the result is still supported by linkers.
When both aligned bundling and RelaxAll are enabled, bundle padding is
directly written into fragments (https://reviews.llvm.org/D8072).
(The original motivation was memory usage, which has been achieved from
different angles with recent assembler improvement).
The code presents challenges with the work to replace fragment
representation (e.g. #94950#95077). This patch removes the special
handling. RelaxAll still works but the behavior seems slightly different
as revealed by 2 changed tests. However, most `-mc-relax-all` tests are
unchanged.
RelaxAll used to be the default for clang -O0. This mode has significant
code size drawbacks and newer Clang doesn't use it (#90013).
---
flushPendingLabels: The FOffset parameter can be removed: pending labels
will be assigned to the incoming fragment at offset 0.
Pull Request: https://github.com/llvm/llvm-project/pull/95188
`allocFragment` might be changed to a placement new when the allocation
strategy changes.
`allocInitialFragment` is to deduplicate the following pattern
```
auto *F = new MCDataFragment();
Result->addFragment(*F);
F->setParent(Result);
```
Pull Request: https://github.com/llvm/llvm-project/pull/95197
For bolt/test/runtime/X86/exceptions-pic.test, llvm-bolt seems to call
emitLabel twice and the assert will fail. Work around it after
2cc4bc132cbcc76c5552cbc128830943ea596b3e
After 9d0754ada5dbbc0c009bcc2f7824488419cc5530 ("[MC] Relax fragments
eagerly") removes the assert of Offset, it is no longer useful to
initialize the member to -1.
Now the symbol value estimate is more precise, which leads to slight
behavior change to layout-interdependency.s.
Fragments are allocated with `operator new` and stored in an ilist with
Prev/Next/Parent pointers. A more efficient representation would be an
array of fragments without the overhead of Prev/Next pointers.
As the first step, replace ilist with singly-linked lists.
* `getPrevNode` uses have been eliminated by previous changes.
* The last use of the `Prev` pointer remains: for each subsection, there is an insertion point and
the current insertion point is stored at `CurInsertionPoint`.
* `HexagonAsmBackend::finishLayout` needs a backward iterator. Save all
fragments within `Frags`. Hexagon programs are usually small, and the
performance does not matter that much.
To eliminate `Prev`, change the subsection representation to
singly-linked lists for subsections and a pointer to the active
singly-linked list. The fragments from all subsections will be chained
together at layout time.
Since fragment lists are disconnected before layout time, we can remove
`MCFragment::SubsectionNumber` (https://reviews.llvm.org/D69411). The
current implementation of `AttemptToFoldSymbolOffsetDifference` requires
future improvement for robustness.
Pull Request: https://github.com/llvm/llvm-project/pull/95077
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).
`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.
I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.
```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc # succeeded
% clang -c b.cc # succeeded with this patch
% clang -S b.cc # still failed
<inline asm>:4:5: error: expected absolute expression
4 | .if . -_start == 1
| ^
1 error generated.
```
However, removing `getUseAssemblerInfoForParsing` would make
MCDwarfFrameEmitter::Emit (for .eh_frame FDE) slow (~4% compile time
regression for sqlite3.c amalgamation) due to expensive
`AttemptToFoldSymbolOffsetDifference`. For now, make
`UseAssemblerInfoForParsing` false in MCDwarfFrameEmitter::Emit.
Close#62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841
Pull Request: https://github.com/llvm/llvm-project/pull/91082
This reverts commit 03c53c69a367008da689f0d2940e2197eb4a955c.
This causes very large compile-time regressions in some cases,
e.g. sqlite3 at O0 regresses by 5%.
Commit 6c0665e22174d474050e85ca367424f6e02476be
(https://reviews.llvm.org/D45164) enabled certain constant expression
evaluation for `MCObjectStreamer` at parse time (e.g. `.if` directives,
see llvm/test/MC/AsmParser/assembler-expressions.s).
`getUseAssemblerInfoForParsing` was added to make `clang -c` handling
inline assembly similar to `MCAsmStreamer` (e.g. `llvm-mc -filetype=asm`),
where such expression folding (related to
`AttemptToFoldSymbolOffsetDifference`) is unavailable.
I believe this is overly conservative. We can make some parse-time
expression folding work for `clang -c` even if `clang -S` would still
report an error, a MCAsmStreamer issue (we cannot print `.if`
directives) that should not restrict the functionality of
MCObjectStreamer.
```
% cat b.cc
asm(R"(
.pushsection .text,"ax"
.globl _start; _start: ret
.if . -_start == 1
ret
.endif
.popsection
)");
% gcc -S b.cc && gcc -c b.cc
% clang -S -fno-integrated-as b.cc # succeeded
% clang -c b.cc # succeeded with this patch
% clang -S b.cc # still failed
<inline asm>:4:5: error: expected absolute expression
4 | .if . -_start == 1
| ^
1 error generated.
```
Close#62520
Link: https://discourse.llvm.org/t/rfc-clang-assembly-object-equivalence-for-files-with-inline-assembly/78841
Pull Request: https://github.com/llvm/llvm-project/pull/91082
C_FILE symbols. To match the behavior of the assembler and the legacy
compiler, this includes using the generic ".file" name for the C_FILE
symbol and generating the actual file name in an auxiliary entry.
Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it set.
Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
When `sym` in `.reloc ., BFD_RELOC_NONE, sym` is not referenced
elsewhere, `sym` is not in the symbol table and the relocation
references the null symbol. Visit the expression to fix the issue.
`MCExpr::evaluateAsAbsolute` has a longstanding bug. When the MCAssembler is
non-null and the MCAsmLayout is null, it may incorrectly fold A-B even if A and
B are separated by a linker-relaxable instruction. This behavior can suppress
some ADD/SUB relocations and lead to wrong results if the linker performs
relaxation.
To fix the bug, ensure that linker-relaxable instructions only appear at the end
of an MCDataFragment, thereby making them terminate the fragment. When computing
A-B, suppress folding if A and B are separated by a linker-relaxable
instruction.
* `.subsection` now correctly give errors for non-foldable expressions.
* gen-dwarf.s will pass even if we add back the .debug_line or .eh_frame/.debug_frame code from D150004
* This will fix suppressed relocation when we add R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128.
In the future, we should investigate the desired behavior for
`MCExpr::evaluateAsAbsolute` when both MCAssembler and MCAsmLayout are non-null.
(Note: MCRelaxableFragment is only for assembler-relaxation. If we ever need
linker-relaxable MCRelaxableFragment, we would need to adjust RISCVMCExpr.cpp
(D58943/D73211).)
Depends on D153096
Differential Revision: https://reviews.llvm.org/D153097
to help debug and report better diagnostics for functions like
relaxDwarfCallFrameFragment (D153167).
In MCStreamer, some emitCFI* functions already take a SMLoc argument. Add a
SMLoc argument to the remaining functions that generate a MCCFIInstruction.
For the out-of-range error, MCConstantExpr doesn't have a location, so we
can only show "<unknown>:0:".
Also, allow subsection numbers up to 2147483647, which is the maximum value GNU
assembler supports. (GNU assembler also supports negative numbers.)
When assembling `.debug_line` for both explicitly specified and synthesized
`.loc` directives. the integrated assembler may incorrectly omit relocations for
-mrelax.
For an assembly file, we have a `MCAssembler` object and `evaluateAsAbsolute`
will incorrectly fold `AddrDelta` to a constant (which is not true in the
presence of linker relaxation).
`MCDwarfLineAddr::Emit` will emit a special opcode, which does not take into
account of linker relaxation. This is a sufficiently complex function that
I think should be called in any "fast paths" for linker relaxation aware assembling.
The following script demonstrates the bugs.
```
cat > x.c <<eof
void f();
void _start() {
f();
f();
f();
}
eof
# C to object file: correct DW_LNS_fixed_advance_pc
clang --target=riscv64 -g -c x.c
llvm-dwarfdump --debug-line -v x.o | grep \ DW_LNS_fixed_advance_pc -q
# Assembly to object file with synthesized line number information: incorrect special opcodes
clang --target=riscv64 -S x.c && clang --target=riscv64 -g -c x.s
llvm-dwarfdump --debug-line -v x.o | grep \ DW_LNS_fixed_advance_pc -q; test $? -eq 1
# Assembly with .loc to object file: incorrect special opcodes
clang --target=riscv64 -S -g x.c && clang --target=riscv64 -c x.s
llvm-dwarfdump --debug-line -v x.o | grep \ DW_LNS_fixed_advance_pc -q; test $? -eq 1
```
The `MCDwarfLineAddr::Emit` code path is an old optimization in commit
57ab708bdd3231b23a8ef4978b11ff07616034a2 (2010) that seems no longer relevant.
It don't trigger for direct machine code emission (label differences are not
foldable without a `MCAssembler`). MCDwarfLineAddr::Emit does complex operations
that are repeated in MCAssembler::relaxDwarfLineAddr, which an intricate RISCV
override.
Let's remove the "fast path". Assembling the assembly output of
X86ISelLowering.cpp with `-g` may be 2% slower, but I think the cost is fine.
There are opportunities to make the "slow path" faster, e.g.
* Optimizing the current new MC*Fragment pattern that allocates new fragments on
the heap.
* Reducing the number of relaxation times for .debug_line and .debug_frame, as
well as possibly other sections using LEB128. For instance, LEB128 can have a
one-byte estimate to avoid the second relaxation iteration.
For assembly input with -mno-relax, in theory we can prefer special opcodes to
DW_LNS_fixed_advance_pc to decrease the size of .debug_line, but such a change
may be overkill and unnecessarily diverge from -mrelax behaviors and GCC.
---
For .debug_frame/.eh_frame, MCDwarf currently emits DW_CFA_advance_loc without
relocations. Remove the special case to enable relocations. Similar to
.debug_line, even without the bug fix, the MCDwarfFrameEmitter::encodeAdvanceLoc
special case is a sufficiently complex code path that should be avoided.
---
When there are more than one section, we generate .debug_rnglists for
DWARF v5. We currently emit DW_RLE_start_length using ULEB128, which
is incorrect. The new test gen-dwarf.s adds a TODO.
---
About other `evaluateAsAbsolute` uses. `MCObjectStreamer::emit[SU]LEB128Value`
have similar code to MCDwarfLineAddr. They are fine to keep as we don't have
LEB128 relocations to correctly represent link-time non-constants anyway.
---
In the future, we should investigate ending a MCFragment for a relaxable
instruction, to further clean up the assembler support for linker relaxation
and fix `evaluateAsAbsolute`.
See bbea64250f65480d787e1c5ff45c4de3ec2dcda8 for some of the related code.
Reviewed By: enh, barannikov88
Differential Revision: https://reviews.llvm.org/D150004