In 39b2979a4 Pavel has kindly refined the implementation of a test in such
a way that it doesn't trip up over this patch -- the test wishes to
stimulate LLDBs presentation of line0 locations, rather than wanting to
always step on line-zero on entry to artificial_location.c. As that's what
was tripping up this change, reapply.
Original commit message follows.
[DWARF] Emit a worst-case prologue_end flag for pathological inputs (#107849)
prologue_end usually indicates where the end of the function-initialization
lies, and is where debuggers usually choose to put the initial breakpoint
for a function. Our current algorithm piggy-backs it on the first available
source-location: which doesn't necessarily have anything to do with the
start of the function.
To avoid this in heavily-optimised code that lacks many useful source
locations, pick a worst-case "if all else fails" prologue_end location, of
the first instruction that appears to do meaningful computation. It'll be
given the function-scope line number, which should run-on from the start of
the function anyway. This means if your code is completely inverted by the
optimiser, you can at least put a breakpoint at the _start_ like you
expect, even if it's difficult to then step through.
This patch also attempts to preserve some good behaviour we have without
optimisations -- at O0, if the prologue immediately falls into a loop body
without any computation happening, then prologue_end lands at the start of
that loop. This is desirable; but does mean we need to do more work to
detect and support those situations.
This patch introduces an experimental intrinsic for matching the
elements of one vector against the elements of another.
For AArch64 targets that support SVE2, the intrinsic lowers to a MATCH
instruction for supported fixed and scalar vector types.
This is a follow-up PR to refactor the initial global merge function
pass implemented in #112671.
It first collects stable functions relevant to the current module and
iterates over those only, instead of iterating through all stable
functions in the stable function map.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
**Summary**
This patch introduces a new compiler option `-mllvm
-emit-func-debug-line-table-offsets` that enables the emission of
per-function line table offsets and end sequences in DWARF debug
information. This enhancement allows tools and debuggers to accurately
attribute line number information to their corresponding functions, even
in scenarios where functions are merged or share the same address space
due to optimizations like Identical Code Folding (ICF) in the linker.
**Background**
RFC: [New DWARF Attribute for Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434)
Previous similar PR:
[#93137](https://github.com/llvm/llvm-project/pull/93137) – This PR was
very similar to the current one but at the time, the assembler had no
support for emitting labels within the line table. That support was
added in PR [#99710](https://github.com/llvm/llvm-project/pull/99710) -
and in this PR we use some of the support added in the assembler PR.
In the current implementation, Clang generates line information in the
`debug_line` section without directly associating line entries with
their originating `DW_TAG_subprogram` DIEs. This can lead to issues when
post-compilation optimizations merge functions, resulting in overlapping
address ranges and ambiguous line information.
For example, when functions are merged by ICF in LLD, multiple functions
may end up sharing the same address range. Without explicit linkage
between functions and their line entries, tools cannot accurately
attribute line information to the correct function, adversely affecting
debugging and call stack resolution.
**Implementation Details**
To address the above issue, the patch makes the following key changes:
**`DW_AT_LLVM_stmt_sequence` Attribute**: Introduces a new LLVM-specific
attribute `DW_AT_LLVM_stmt_sequence` to each `DW_TAG_subprogram` DIE.
This attribute holds a label pointing to the offset in the line table
where the function's line entries begin.
**End-of-Sequence Markers**: Emits an explicit DW_LNE_end_sequence after
each function's line entries in the line table. This marks the end of
the line information for that function, ensuring that line entries are
correctly delimited.
**Assembler and Streamer Modifications**: Modifies the MCStreamer and
related classes to support emitting the necessary labels and tracking
the current function's line entries. A new flag
GenerateFuncLineTableOffsets is added to control this behavior.
**Compiler Option**: Introduces the `-mllvm
-emit-func-debug-line-table-offsets` option to enable this
functionality, allowing users to opt-in as needed.
This implements a global function merging pass. Unlike traditional
function merging passes that use IR comparators, this pass employs a
structurally stable hash to identify similar functions while ignoring
certain constant operands. These ignored constants are tracked and
encoded into a stable function summary. When merging, instead of
explicitly folding similar functions and their call sites, we form a
merging instance by supplying different parameters via thunks. The
actual size reduction occurs when identically created merging instances
are folded by the linker.
Currently, this pass is wired to a pre-codegen pass, enabled by the
`-enable-global-merge-func` flag.
In a local merging mode, the analysis and merging steps occur
sequentially within a module:
- `analyze`: Collects stable function hashes and tracks locations of
ignored constant operands.
- `finalize`: Identifies merge candidates with matching hashes and
computes the set of parameters that point to different constants.
- `merge`: Uses the stable function map to optimistically create a
merged function.
We can enable a global merging mode similar to the global function
outliner
(https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/),
which will perform the above steps separately.
- `-codegen-data-generate`: During the first round of code generation,
we analyze local merging instances and publish their summaries.
- Offline using `llvm-cgdata` or at link-time, we can finalize all these
merging summaries that are combined to determine parameters.
- `-codegen-data-use`: During the second round of code generation, we
optimistically create merging instances within each module, and finally,
the linker folds identically created merging instances.
Depends on #112664
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
Add a specification attribute to LLVM DebugInfo, which is analogous
to DWARF's DW_AT_specification. According to the DWARF spec:
"A debugging information entry that represents a declaration that
completes another (earlier) non-defining declaration may have a
DW_AT_specification attribute whose value is a reference to the
debugging information entry representing the non-defining declaration."
This patch allows types to be specifications of other types. This is
used by Swift to represent generic types. For example, given this Swift
program:
```
struct MyStruct<T> {
let t: T
}
let variable = MyStruct<Int>(t: 43)
```
The Swift compiler emits (roughly) an unsubtituted type for MyStruct<T>:
```
DW_TAG_structure_type
DW_AT_name ("MyStruct")
// "$s1w8MyStructVyxGD" is a Swift mangled name roughly equivalent to
// MyStruct<T>
DW_AT_linkage_name ("$s1w8MyStructVyxGD")
// other attributes here
```
And a specification for MyStruct<Int>:
```
DW_TAG_structure_type
DW_AT_specification (<link to "MyStruct">)
// "$s1w8MyStructVySiGD" is a Swift mangled name equivalent to
// MyStruct<Int>
DW_AT_linkage_name ("$s1w8MyStructVySiGD")
DW_AT_byte_size (0x08)
// other attributes here
```
Check that every path from the entry block to the use block passes
through at least one def block. Previously we only checked that at least
one path passed through a def block.
into anyext x
; CHECK-NEXT: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[TRUNC]](s32),
[[DEF]](s32)
Please continue padding merge values.
// %bits_8_15:_(s8) = G_IMPLICIT_DEF
// %0:_(s16) = G_MERGE_VALUES %bits_0_7:(s8), %bits_8_15:(s8)
%bits_8_15 is defined by undef. Its value is undefined and we can pick
an arbitrary value. For optimization, we pick anyext, which plays well
with the undefinedness.
// %0:_(s16) = G_ANYEXT %bits_0_7:(s8)
The upper bits of %0 are undefined and the lower bits come from
%bits_0_7.
When the chain is not the entry node there is a risk the stores are
within a (CALLSEQ_START, CALLSEQ_END), which when the node is expanded
will lead to nested call sequences.
It should be possible to check for this and allow more cases, but for
now, let's limit this to cases where it's definitely safe.
Fixes#115323
Do not consider profile data when choosing a successor block to sink
into for optsize functions. This should result in more consistent
instruction sequences which will improve outlining and ICF. We've
observed a slight codesize improvement in a large binary. This is
similar reasoning to https://github.com/llvm/llvm-project/pull/114607.
Using profile data to select a block to sink into was original added in
d04f7596e7.
* Do not use profile data when flipping a branch condition when
optimizing for size. This should improving outlining and ICF due to more
uniform instruction sequences.
* Refactor `optimizeBranches()` to use early `continue`s
* Use the correct debug location for `insertBranch()`
This reverts commit bf483ddb42065405e345393e022dc72357ec5a3a.
See PR, there's a test testing for this behaviour (possibly adaptable), and
a duplicate line entry too
Add `ICmpInst::compare()` overload accepting `KnownBits`, similar to the
existing one accepting `APInt`. This is not directly part of KnownBits
(or APInt) for layering reasons.
This patch follows on from the changes made in #105524, by adding an
additional heuristic that prevents us from applying the start-of-MBB
is_stmt flag when we can see that, for all direct branches to the MBB,
the last line stepped on before the branch is the same as the first line
of the MBB. This is mainly to prevent certain pathological cases, such
as macros that expand to multiple basic blocks that all have the same
source location, from giving us repeated steps on the same line. This
approach is not comprehensive, since it relies on analyzeBranch to read
edges, but the default fallback of applying is_stmt may lead only to
useless steps in some cases, rather than skipping useful steps
altogether.
prologue_end usually indicates where the end of the function-initialization
lies, and is where debuggers usually choose to put the initial breakpoint
for a function. Our current algorithm piggy-backs it on the first available
source-location: which doesn't necessarily have anything to do with the
start of the function.
To avoid this in heavily-optimised code that lacks many useful source
locations, pick a worst-case "if all else fails" prologue_end location, of
the first instruction that appears to do meaningful computation. It'll be
given the function-scope line number, which should run-on from the start of
the function anyway. This means if your code is completely inverted by the
optimiser, you can at least put a breakpoint at the _start_ like you
expect, even if it's difficult to then step through.
This patch also attempts to preserve some good behaviour we have without
optimisations -- at O0, if the prologue immediately falls into a loop body
without any computation happening, then prologue_end lands at the start of
that loop. This is desirable; but does mean we need to do more work to
detect and support those situations.
PostRA scheduling supports different directions now, but we can
only specify it via command line options.
This patch adds a new hook `overridePostRASchedPolicy` for targets
to override PostRA scheduling policy.
Note that some options like tracking register pressure won't take
effect in PostRA scheduling.
The MIRPrinter emits ` :: ` at the start of a MMO. The MIRLexer eats all
the white space after the operand and before the `::` when there is no
comment. We need to eat the space after the comment to allow MIRLexer to
parse comments on a MMO.
These specify that the value of the given register in the previous frame
is the CFA plus some offset. This isn't very common but can be necessary
if the original value is normally reconstructed from the stack/frame
pointer instead of being saved on the stack and reloaded from there.
We already support computing known bits for extending loads, but not for
masked loads. For now I've only added support for zero-extends because
that's the only thing currently tested. Even when the passthru value is
poison we still know the top X bits are zero.
Note that PointerUnion::{is,get,dyn_cast} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
This fixes the infinite loop discovered in #114195.
Since we skip debug instructions at the start of the loop we do not need
to skip them again at the end of the loop.
The increase in fallbacks that was previously reported were not caused
by this change.
Original description:
This matches InstCombine and DAGCombine.
RISC-V only has an ADDI instruction so without this we need additional
patterns to do the conversion.
Some of the AMDGPU tests look like possible regressions. Maybe some
patterns from isel aren't imported.
For some reason there was a hasOneUse check on the splat for the
second operand and it's not obvious to me why. The check blocks
optimisations for lowering of nodes like AVGFLOORU and AVGCEILU.
In a follow-on patch I also plan to improve the generated code
for AVGCEILU further by teaching computeKnownBits about
zero-extending masked loads.
Previously we set the dump direction according to command line
options, but we may override the scheduling direction in `initPolicy`
and this results in mismatch between dump and actual policy.
Here we simply set the dump direction after initializing the policy.
CopyHints set has been collecting duplicates of a register with
increasing weight and then deduplicated with HintedRegs set. Let's stop
collecting duplicates at the first place.
This reverts commit c31014322c0b5ae596da129cbb844fb2198b4ef4.
Based on the discussions in #112772, this pass is not needed after the
introduction of `llvm.threadlocal.address` intrinsic.
Fixes https://github.com/llvm/llvm-project/issues/112771.
An extra inhabitant is a bit pattern that does not represent a valid
value for instances of a given type. The number of extra inhabitants is
the number of those bit configurations.
This is used by Swift to save space when composing types. For example,
because Bool only needs 2 bit patterns to represent all of its values
(true and false), an Optional<Bool> only occupies 1 byte in memory by
using a bit configuration that is unused by Bool. Which bit patterns are
unused are part of the ABI of the language.
Since Swift generics are not monomorphized, by using dynamic libraries
you can have generic types whose size, alignment, etc, are known only
at runtime (which is why this feature is needed).
This patch adds num_extra_inhabitants to LLVM-IR debug info and in DWARF
as an Apple extension.
This merges the logic for expanding both FFREXP and FSINCOS into one
method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication
and also allows FFREXP to benefit from the stack slot elimination
implemented for FSINCOS. This method will also be used in future to
implement more multiple-result intrinsics (such as modf and sincospi).
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
%0 = cleanuppad ...
call @destructor
cleanupret from %0 unwind label %catch.dispatch
```
If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
%0 = cleanuppad ...
invoke @throwing_func
to label %unreachale unwind label %catch.dispatch.i
catch.dispatch.i:
catchswitch ... [ label %catch.start.i ]
catch.start.i:
%1 = catchpad ...
invoke @some_function
to label %invoke.cont.i unwind label %terminate.i
invoke.cont.i:
catchret from %1 to label %destructor.exit
destructor.exit:
cleanupret from %0 unwind label %catch.dispatch
```
We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.
The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).
This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:
5091a359d9/llvm/include/llvm/Target/TargetSelectionDAG.td (L700)
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.
This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.
---
- Details about changes in the Wasm backend:
After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.
In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.
---
Fixes#114600.
... matching what we have in the disassembler. This isn't turned on by
default since several of the scheduling models are not completely
accurate, and we don't want to be misleading.