Add the `dead_on_unwind` attribute, which states that the caller will
not read from this argument if the call unwinds. This allows eliding
stores that could otherwise be visible on the unwind path, for example:
```
declare void @may_unwind()
define void @src(ptr noalias dead_on_unwind %out) {
store i32 0, ptr %out
call void @may_unwind()
store i32 1, ptr %out
ret void
}
define void @tgt(ptr noalias dead_on_unwind %out) {
call void @may_unwind()
store i32 1, ptr %out
ret void
}
```
The optimization is not valid without `dead_on_unwind`, because the `i32
0` value might be read if `@may_unwind` unwinds.
This attribute is primarily intended to be used on sret arguments. In
fact, I previously wanted to change the semantics of sret to include
this "no read after unwind" property (see D116998), but based on the
feedback there it is better to keep these attributes orthogonal (sret is
an ABI attribute, dead_on_unwind is an optimization attribute). This is
a reboot of that change with a separate attribute.
As part of the RemoveDIs project, transitioning to non-instruction debug
info, all debug intrinsic handling code needs to be duplicated to handle
DPValues.
--try-experimental-debuginfo-iterators enables the new debug mode in
tests if the CMake option has been enabled.
`getInsertPtAfterFramePtr` now returns an iterator so we don't lose
debug-info-communicating bits.
---
Depends on #73500, #74090, #74091.
Note that all the patches that implement support for declare-style
DPValues have tests that are "rotten green" test without this
patch (i.e., they pass at the moment without testing what we
want them to test). See the Pull Request for more detail on this.
These flags are usable on floating point arithmetic, as well as call,
select, and phi instructions whose resulting type is floating point, or
a vector of, or an array of, a valid type. Whether or not the flags are
valid for a given instruction can be checked with the new
LLVMCanValueUseFastMathFlags function.
These are exposed using a new LLVMFastMathFlags type, which is an alias
for unsigned. An anonymous enum defines the bit values for it.
Tests are added in echo.ll for select/phil/call, and the floating point
types in the new float_ops.ll bindings test.
Select and the floating point arithmetic instructions were not
implemented in llvm-c-test/echo.cpp, so they were added as well.
This simplifies an upcoming patch to support the RemoveDIs project (tracking
variable locations without using intrinsics).
Next in this series is #73500.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
Added the following functions for manipulating operand bundles, as well as
building ``call`` and ``invoke`` instructions that use operand bundles:
* LLVMBuildCallWithOperandBundles
* LLVMBuildInvokeWithOperandBundles
* LLVMCreateOperandBundle
* LLVMDisposeOperandBundle
* LLVMGetNumOperandBundles
* LLVMGetOperandBundleAtIndex
* LLVMGetNumOperandBundleArgs
* LLVMGetOperandBundleArgAtIndex
* LLVMGetOperandBundleTag
Fixes#71873.
With opaque pointers, the address spaces will only be the same if
the types are the same, in which case this would have been handled
at the start of the method already.
On platforms where char is signed, the ">> 4" shift will produce
incorrect results. We were already working on unsigned char for
most characters, but not for the first one.
Fixes https://github.com/llvm/llvm-project/issues/74732.
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of missing
frames from tail calls in profiled call stacks for MemProf of profiled
binaries that did not disable tail call elimination. A follow on change
will add the use of this new flag during MemProf context disambiguation.
The new flag is encoded in the bitcode along with either the hotness
flag from the profile, or the relative block frequency under the
-write-relbf-to-summary flag when there is no profile data.
Because we now will always have some additional call edge information, I
have removed the non-profile function summary record format, and we
simply encode the tail call flag along with a hotness type of none when
there is no profile information or relative block frequency. The change
of record format and name caused most of the test case changes.
I have added explicit testing of generation of the new tail call flag
into the bitcode and IR assembly format as part of the changes to
llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also
added round trip testing through assembly and bitcode to
llvm/test/Assembler/thinlto-summary.ll.
Some final errors have turned up when doing stage2clang builds:
* We can insert before end(), which won't have an attached DPMarker,
thus we can have a nullptr DPMarker in Instruction::insertBefore. Add a
null check there.
* We need to use the iterator-returning form of getFirstNonPHI to ensure
we don't insert any PHIs behind debug-info at the start of a block.
In the debug-info-splice implementation, we need to be careful to delete
trailing DPMarkers from blocks when we splice their contents out. This
is equivalent to removing the terminator from a block, then splicing the
rest of it's contents to another block: any DPValues trailing at the end
of the block get moved and we need to clean up afterwards.
We occasionally move instructions to their own positions, which is not
an error, and has no effect on the program. However, if dbg.value
intrinsics are present then they can effectively be moved without any
other effect on the program.
This is a problem if we're using non-instruction debug-info: a moveAfter
that appears to be a no-op in RemoveDIs mode can jump in front of
intrinsics in dbg.value mode. Thus: if an instruction is moved to itself
and the head bit is set, force attached debug-info to shift down one
instruction, which replicates the dbg.value behaviour.
The order of dbg.value intrinsics appearing in the output can affect the
order of tables in DWARF sections. This means that DPValues, our
dbg.value replacement, needs to obey the same ordering rules. For
dbg.values returned by findDbgUsers it's reverse order of creation (due
to how they're put on use-lists). Produce that order from findDbgUsers
for DPValues.
I've got a few IR files where the order of dbg.values flips, but it's a
fragile test -- ultimately it needs the number of times a DPValue is
handled by findDbgValues to be odd.
There is support for intrinsics in Instruction::isCommunative, but there
is no equivalent implementation for isAssociative. This patch builds
support for associative intrinsics with TRE as an application. TRE can
now have associative intrinsics as an accumulator. For example:
```
struct Node {
Node *next;
unsigned val;
}
unsigned maxval(struct Node *n) {
if (!n) return 0;
return std::max(n->val, maxval(n->next));
}
```
Can be transformed into:
```
unsigned maxval(struct Node *n) {
struct Node *head = n;
unsigned max = 0; // Identity of unsigned std::max
while (true) {
if (!head) return max;
max = std::max(max, head->val);
head = head->next;
}
return max;
}
```
This example results in about 5x speedup in local runs.
We conservatively only consider min/max and as associative for this
patch to limit testing scope. There are probably other intrinsics that
could be considered associative. There are a few consumers of
isAssociative() that could be impacted. Testing has only required to
Reassociate pass be updated.
We can use Intrinsic::getDeclaration() here, we just have to pass
the correct arguments. This function accepts only the mangled types,
not all argument types.
A large amount of complexity when it comes to shuffling DPValue objects
around is pushed into BasicBlock::spliceDebugInfo, and it gets
comprehensive testing there via the unit tests. It turns out that there's a
corner case though: splicing instructions and debug-info to the end()
iterator requires blocks of DPValues to be concatenated, but the DPValues
don't behave normally as they're dangling at the end of a block. While this
splicing-to-an-empty-block case is rare, and it's even rarer for it to
contain debug-info, it does happen occasionally.
Fix this by wrapping spliceDebugInfo with an outer layer that removes any
dangling DPValues in the destination block -- that way the main splicing
function (renamed to spliceDebugInfoImpl) doesn't need to worry about that
scenario. See the diagram in the added function for more info.
DenseMapAPIntKeyInfo looks like a redundant definition because it
mirrors the default used by DenseMap when not specified.
Replacing DenseMapAPFloatKeyInfo with a specialisation of
DenseMapInfo allows DenseMap<T> to be more easily used when T is
an aggregate type containing an APFloat.
With RTTI, a C++ class type info will get two entries in the summary
index: a gv and a typeidCompatibleVTable, both sharing the same GUID.
Ensure we use different namespaces to generate the entry slot numbers
for these two different summary entries.
CodeGenPrepare needs to support the maintenence of DPValues, the
non-instruction replacement for dbg.value intrinsics. This means there are
a few functions we need to duplicate or replicate the functionality of:
* fixupDbgValue for setting users of sunk addr GEPs,
* The remains of placeDbgValues needs a DPValue implementation for sinking
* Rollback of RAUWs needs to update DPValues
* Rollback of instruction removal needs supporting (see github #73350)
* A few places where we have to use iterators rather than instructions.
There are three places where we have to use the setHeadBit call on
iterators to indicate which portion of debug-info records we're about to
splice around. This is because CodeGenPrepare, unlike other optimisation
passes, is very much concerned with which block an operation occurs in and
where in the block instructions are because it's preparing things to be in
a format that's good for SelectionDAG.
There isn't a large amount of test coverage for debuginfo behaviours in
this pass, hence I've added some more.
This change has no meaningful effect on the compiler, although it has a
functional effect of dbg.value intrinsics being printed differently. The
tail-call flag is meaningless for debug-intrinsics and doesn't serve a
purpose, it's just extra baggage that dbg.values are built on top of.
Some facilities create debug-intrinsics with the flag, others don't.
However, the RemoveDIs project to represent debug-info without
intrinsics doesn't have a corresponding flag, which can cause spurious
test differences.
Specifically: we can convert a dbg.value to a DPValue, run an
optimisation pass, then convert the DPValue back to dbg.value form.
Right now, we always set the "tail" flag when converting it back. This
causes the auto-update-tests script to fail sometimes because in one
mode (dbg.value) intrinsics might not have a tail flag, but in the other
they do have a tail flag. Consistently picking one or the other in the
conversion routine doesn't help, because the rest of LLVM is
inconsistent about it anyway.
Thus: whenever we make a dbg.value intrinsic, create it as a tail call,
so that we get consistent output behaviours no matter which debug-info
mode we're in, DPValue or dbg.value. No tests fail as a result of this
patch because the extra 'tail' generated in numerous tests is
automatically ignored by FileCheck as being leading-rubbish before the
CHECK match.
Because we're storing some extra debug-info information in the iterator
class, we need to insert new LICM-created stores using such iterators.
Switch LICM to storing iterators instead of pointers when it promotes
variables in loops, add a test for the desired behaviour, and enable
RemoveDIs instrumentation on a variety of other LICM tests for good
measure.
(This would appear to be the only pass in LLVM that needs to store
iterators on the heap).
Here's a problem for the RemoveDIs project to make debug-info not be
stored in instructions -- in the following sequence:
dbg.value(foo
%bar = add i32 ...
dbg.value(baz
It's possible for rare passes (only CodeGenPrepare) to remove the add
instruction, and then re-insert it back in the same place. When
debug-info is stored in instructions and there's a total order on "when"
things happen this is easy, but by moving that information out of the
instruction stream we start having to do manual maintenance.
This patch adds some utilities for re-inserting an instruction into a
sequence of DPValue objects. Someday we hope to design this away, but
for now it's necessary to support all the things you can do with
dbg.values. The two unit tests show how DPValues get shuffled around
using the relevant function calls. A follow-up patch adds
instrumentation to CodeGenPrepare.
Part of the "RemoveDIs" project to remove debug intrinsics requires
passing block-positions around in iterators rather than as instruction
pointers, allowing some debug-info to reside in BasicBlock::iterator.
This means getInsertionPointAfterDef has to return an iterator, and as
it can return no-instruction that means returning an optional iterator.
This patch changes the signature for getInsertionPtAfterDef and then
patches up the various places that use it to handle the different type.
This would overall be an NFC patch, however in
InstCombinerImpl::freezeOtherUses I've started skipping any debug
intrinsics at the returned insert-position. This should not have any
_meaningful_ effect on the compiler output: at worst it means variable
assignments that are skipped will now cover the freeze instruction and
anything inserted before it, which should be inconsequential.
Sadly: this makes the function signature ugly. This is probably the
ugliest piece of fallout for the "RemoveDIs" work, but it serves the
overall purpose of improving compile times and not allowing `-g` to
affect compiler output, so should be worthwhile in the end.
With these three intrinsics it's probable faster to check the number of
arguments first and then check the names. We can also handle ctlz and
cttz in the same block.
Loop-rotate manually maintains dbg.value intrinsics -- it also needs to
manually maintain the replacement for dbg.value intrinsics, DPValue
objects. For the most part this patch adds parallel implementations
using the new type Some extra juggling is needed when loop-rotate hoists
loop-invariant instructions out of the loop: the DPValues attached to
such an instruction need to get rotated but not hoisted. Exercised by
the new test function invariant_hoist in dbgvalue.ll.
There's also a "don't insert duplicate debug intrinsics" facility in
LoopRotate. The value and correctness of this isn't clear, but to
continue preserving behaviour that's now tested in the "tak_dup"
function in dbgvalue.ll.
Other things in this patch include a helper DebugVariable constructor
for DPValues, a insertDebugValuesForPHIs handler for RemoveDIs
(exercised by the new tests), and beefing up the dbg.value checking in
dbgvalue.ll to ensure that each record is tested (and that there's an
implicit check-not).
With intrinsics representing debug-info, we just clone all the
intrinsics when inlining a function and don't think about it any
further. With non-instruction debug-info however we need to be a bit
more careful and manually move the debug-info from one place to another.
For the most part, this means keeping a "cursor" during block cloning of
where we last copied debug-info from, and performing debug-info copying
whenever we successfully clone another instruction.
There are several utilities in LLVM for doing this, all of which now
need to manually call cloneDebugInfo. The testing story for this is not
well covered as we could rely on normal instruction-cloning mechanisms
to do all the hard stuff. Thus, I've added a few tests to explicitly
test dbg.value behaviours, ahead of them becoming not-instructions.
This flag indicates that every bit is known to be zero in at least one
of the inputs. This allows the Or to be treated as an Add since there is
no possibility of a carry from any bit.
If the flag is present and this property does not hold, the result is
poison.
This makes it easier to reverse the InstCombine transform that turns Add
into Or.
This is inspired by a comment here
https://github.com/llvm/llvm-project/pull/71955#discussion_r1391614578
Discourse thread
https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036
There's good justification for having a function specifying "I need there to be
a marker here, so return the marker there or create a new one". This was going
to come later in the series, but it's starting to become necessary much eariler
alas.
Make use of it in spliceDebugInfo, where we can occasionally splice DPValues
onto the end() iterator of a block while it's been edited.
Cache the result of the TLI libfunc lookup in the Function object. This
only caches the actual lookup of the LibFunc in the TLI map, but not the
prototype validation, as that might differ between different TLI
instances.
This uses the existing mechanism for invalidating the intrinsic ID when
the function name changes. The libfunc will be invalidated in that case
as well.
I don't believe this increases the size of Function on 64bit (which
currently has a trailing `bool` member), and I don't think we would
particularly care if it did, as Functions are uncommon as Values go.
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)