MachineIR alias analysis assumes that only bytes after the pointer will
be accessed. This is incorrect if the stride is negative.
This is causing miscompiles in our downstream after SLP started making
strided loads.
Fixes#82657
Patch 2 of 3 to add llvm.dbg.label support to the RemoveDIs project. The
patch stack adds the DPLabel class, which is the RemoveDIs llvm.dbg.label
equivalent.
1. Add DbgRecord base class for DPValue and the not-yet-added
DPLabel class.
2. Add the DPLabel class.
-> 3. Add support to passes.
The next patch, #82639, will enable conversion between dbg.labels and DPLabels.
AssignemntTrackingAnalysis support could have gone two ways:
1. Have the analysis store a DPLabel representation in its results -
SelectionDAGBuilder reads the analysis results and ignores all DbgRecord
kinds.
2. Ignore DPLabels in the analysis - SelectionDAGBuilder reads the analysis
results but still needs to iterate over DPLabels from the IR.
I went with option 2 because it's less work and is no less correct than 1. It's
worth noting that causes labels to sink to the bottom of packs of debug records.
e.g., [value, label, value] becomes [value, value, label]. This shouldn't be a
problem because labels and variable locations don't have an ordering requirement.
The ordering between variable locations is maintained and the label movement is
deterministic
This is another step in the direction of fixing the `Fixed(0) !=
Scalable(0)` bugbear, although whilst weird I don't believe it's causing
us any real issues.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
Patch 1 of 3 to add llvm.dbg.label support to the RemoveDIs project. The
patch stack adds a new base class
-> 1. Add DbgRecord base class for DPValue and the not-yet-added
DPLabel class.
2. Add the DPLabel class.
3. Enable dbg.label conversion and add support to passes.
Patches 1 and 2 are NFC.
In the near future we also will rename DPValue to DbgVariableRecord and
DPLabel to DbgLabelRecord, at which point we'll overhaul the function
names too. The name DPLabel keeps things consistent for now.
Summary:
This patch adds a new intrinsic and builtin function mirroring the
existing `__builtin_readcyclecounter`. The difference is that this
implementation targets a separate counter that some targets have which
returns a fixed frequency clock that can be used to determine elapsed
time, this is different compared to the cycle counter which often has
variable frequency.
This patch only adds support for the NVPTX and AMDGPU targets.
This is done as a new and separate builtin rather than an argument to
`readcyclecounter` to avoid needing to change existing code and to make
the separation more explicit.
This patch adds support for DPVAssigns across all of
AssignmentTrackingAnalysis except for AssignmentTrackingLowering, which
is implemented in a separate patch. This patch includes handling
DPValues in MemLocFragFill, the removal of redundant DPValues as part of
AssignmentTrackingAnalysis (which is different to the version in
`BasicBlockUtils.cpp`), and preventing the DPVAssigns from being
directly emitted in SelectionDAG (just as we don't emit llvm.dbg.assigns
directly, but receive a set of locations from
AssignmentTrackingAnalysis' output).
This patch abstracts visitEntryValueDbgValue to deal with the substance
of variable locations (Value, Var, Expr, DebugLoc) rather than how
they're stored. That allows us to call it from handleDebugValue, which
is similarly abstracted. This allows the entry-value behaviour (see the
test) to be supported with non-instruction debug-info too!.
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.
This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of `GTI.getSequentialElementStride(DL)`,
which is a new helper function added in this PR.
This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:
* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.
Existing tests are unaffected, but a miscompilation of a new test is fixed.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
This copies the flag from IR to the SDNode in SelectionDAGBuilder, clears
the flag in SimplifyDemandedBits, and adds it to canCreateUndefOrPoison.
Uses of the flag will come in later patches.
This is a boring mechanical update to support DPValues that look like
dbg.declares in SelectionDAG.
The tests will become "live" once #74090 lands (see for more info).
This option enables an experimental lowering for unordered atomics I worked
on a few years back. It never reached production quality, and hasn't been
worked on in years. So let's rip it out.
This wasn't a crazy idea, but I hit some stumbling block which prevented me
from pushing it across the finish line. From the look of 027aa27, that
change description is probably a good summary. I don't remember the
details any longer.
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
DPValues are the non-intrinsic replacements for dbg.values, and when an
IR function is converted by SelectionDAG we need to convert the variable
location information in the same way. Happily all the information is in
the same format, it's just stored in a slightly different object,
therefore this patch refactors a few things to store the set of
{Variable,Expr,DILocation,Location} instead of just a pointer to a
DbgValueInst.
This also adds a hook in llc that's much like the one I've added to opt
in PR #71937, allowing tests to optionally ask for the use RemoveDIs
mode if support for it is built into the compiler.
I've added that flag to a variety of SelectionDAG debug-info tests to
ensure that we get some coverage on the RemoveDIs / debug-info-iterator
buildbot.
visitJumpTable is called on FinishBasicBlock. At that time, getCurSDLoc
will always return SDLoc without DebugLoc since CurInst was set to
nullptr after visiting each instruction.
This patch passes SDLoc to buildJumpTable when visiting SwitchInst so
that visitJumpTable can use it later.
Situations may arise leading to negative `NumElements` argument of an
`alloca` instruction. In this case the `NumElements` is treated as a
large unsigned value. Such large arrays may cause the size constant to
overflow during code generation under 32 bit mode, leading to a crash.
This PR limits the constant's bit width to the width of the pointer on
the target. With this fix,
```
alloca i32, i32 -1
```
and
```
alloca [4294967295 x i32], i32 1
```
generates the exact same PowerPC assembly code under 32 bit mode.
The @llvm.amdgcn.cs.chain intrinsic is essentially a call. The call
parameters are bundled up into 2 intrinsic arguments, one for those that
should go in the SGPRs (the 3rd intrinsic argument), and one for those
that should go in the VGPRs (the 4th intrinsic argument). Both will
often be some kind of aggregate.
Both instruction selection frameworks have some internal representation
for intrinsics (G_INTRINSIC[_WITH_SIDE_EFFECTS] for GlobalISel,
ISD::INTRINSIC_[VOID|WITH_CHAIN] for DAGISel), but we can't use those
because aggregates are dissolved very early on during ISel and we'd lose
the inreg information. Therefore, this patch shortcircuits both the
IRTranslator and SelectionDAGBuilder to lower this intrinsic as a call
from the very start. It tries to use the existing infrastructure as much
as possible, by calling into the code for lowering tail calls.
This has already gone through a few rounds of review in Phab:
Differential Revision: https://reviews.llvm.org/D153761
This adds the nneg flag to SDNodeFlags and the node printing code.
SelectionDAGBuilder will add this flag to the node if the target doesn't
prefer sign extend.
A future RISC-V patch can remove the sign extend preference from
SelectionDAGBuilder.
I've also added the flag to the DAG combine that converts
ISD::SIGN_EXTEND to ISD::ZERO_EXTEND.
This will select i32 operations directly to W instructions without
custom nodes. Hopefully this can allow us to be less dependent on
hasAllNBitUsers to recover i32 operations in RISCVISelDAGToDAG.cpp.
This support is enabled with a command line option that is off by
default.
Generated code is still not optimal.
I've duplicated many test cases for this, but its not complete. Enabling this runs all existing lit tests without crashing.
This patch introduces an experimental intrinsic for counting the
trailing zero elements in a vector. The intrinsic has generic expansion
in SelectionDAGBuilder, and for AArch64 there is a pattern which matches
to brkb & cntp instructions where SVE is enabled.
The intrinsic has a second operand, is_zero_poison, similar to the
existing cttz intrinsic.
These changes have been split out from D158291.
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. Note that this change is the first point where the flag is
being used for an optimization, and thus may expose latent miscompiles.
We've recently taught both CVP and InstCombine to infer the flag when
forming zext, but nothing else is using the flag just yet.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
Currently, we specify that the ptrmask intrinsic allows the mask to have
any size, which will be zero-extended or truncated to the pointer size.
However, what semantics of the specified GEP expansion actually imply is
that the mask is only meaningful up to the pointer type *index* size --
any higher bits of the pointer will always be preserved. In other words,
the mask gets 1-extended from the index size to the pointer size. This
is also the behavior we want for CHERI architectures.
This PR makes two changes:
* It spells out the interaction with the pointer type index size more
explicitly.
* It requires that the mask matches the pointer type index size. The
intention here is to make handling of this intrinsic more robust, to
avoid accidental mix-ups of pointer size and index size in code
generating this intrinsic. If a zero-extend or truncate of the mask is
desired, it should just be done explicitly in IR. This also cuts down on
the amount of testing we have to do, and things transforms needs to
check for.
As far as I can tell, we don't actually support pointers with different
index type size at the SDAG level, so I'm just asserting the sizes match
there for now. Out-of-tree targets using different index sizes may need
to adjust that code.
The current lowering of statepoints does not take into account return
attributes present on the `gc.result` leading to different code being
generated than if one were to not use statepoints. These return
attributes can affect the ABI which is why it is important that they are
applied in the lowering.
https://reviews.llvm.org/D152789 added an `exit` op before each
`unreachable`. This means we never get to the `trap` instruction.
This change limits the insertion of `exit` instructions to the cases
where `unreachable` is not lowered to `trap`. Trap itself is changed to
be emitted as `trap; exit;` to convey to `ptxas` that it exits the CFG.
This seems to cause Clang to crash, see comments on the code review. Reverting
until the problem can be investigated.
> Part 1 of 3. This includes the LLVM back-end processing and profile
> reading/writing components. compiler-rt changes are included.
>
> Differential Revision: https://reviews.llvm.org/D138846
This reverts commit a50486fd736ab2fe03fcacaf8b98876db77217a7.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
The intrinsic uses ImmArg so TargetConstant would be consistent
with how other intrinsics are handled.
This hides the constants from type legalization so we can remove
the promotion support.
isel patterns are updated accordingly.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd.
Also revert the followup, "[InlineAsm] fix botched merge conflict resolution"
This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1.
There were SystemZ and Mips build errors, too many to fix forward.
Similar to
commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC")
Fix the TODOs added in
commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC
(#65649)")
I would like to steal one of these bits to denote whether a kind may be
spilled by the register allocator or not, but I'm afraid to touch of any
this code using bitwise operands.
Make flags a first class type using bitfields, rather than launder data
around via `unsigned`.
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.
https://reviews.llvm.org/D157871
Irritatingly, atomic_store had operands in the opposite order from
regular store. This made it difficult to share patterns between
regular and atomic stores.
There was a previous incomplete attempt to move atomic_store into the
regular StoreSDNode which would be better.
I think it was a mistake for all atomicrmw to swap the operand order,
so maybe it's better to take this one step further.
https://reviews.llvm.org/D123143
Should add some minor type safety to the use of this information, since
there's quite a bit of metadata being laundered through an `unsigned`.
I'm looking to potentially add more bitfields to that `unsigned`, but I
find InlineAsm's big ol' bag of enum values and usage of `unsigned`
confusing, type-unsafe, and un-ergonomic. These can probably be better
abstracted.
I think the lack of static_cast outside of InlineAsm indicates the prior
code smell fixed here.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D159242
There is no vp.fpclass after FCLASS_VL(D151176), try to support vp.fpclass.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D152993
When SelectiondDAG converts dbg.value intrinsics, it first ensures we have
already generated code for the value operator of the intrinsic. The rationale
being that if we haven't had the need to generate code for this value, it won't
be a debug value that causes the generation.
For example, if the first use the physical register of an argument is a
dbg.value, we are going to hit this code path. However, this is irrelevant for
entry value expressions: by definition we are not interested in the _current_
value of the physical register, but rather on its value at the start of the
function. To deal with this, this patch changes lowering to handle this case as
early as possible.
Differential Revision: https://reviews.llvm.org/D158649
The change introduces intrinsics 'get_fpmode', 'set_fpmode' and
'reset_fpmode'. They manage all target dynamic floating-point control
modes, which include, for instance, rounding direction, precision,
treatment of denormals and so on. The intrinsics do the same
operations as the C library functions 'fegetmode' and 'fesetmode'. By
default they are lowered to calls to these functions.
Two main use cases are supported by this implementation.
1. Local modification of the control modes. In this case the code
usually has a pattern (in pseudocode):
saved_modes = get_fpmode()
set_fpmode(<new_modes>)
...
<do operations under the new modes>
...
set_fpmode(saved_modes)
In the case when it is known that the current FP environment is default,
the code may be shorter:
set_fpmode(<new_modes>)
...
<do operations under the new modes>
...
reset_fpmode()
Such patterns appear not only in user code but also in implementations
of various FP controlling pragmas. In particular, the implementation of
`#pragma STDC FENV_ROUND` requires similar code if the target does not
support static rounding mode.
2. Portable control of FP modes. Usually FP control modes are set by
writing to some control register. Different targets have different
layout of this register, the way the register is accessed also may be
different. Using set of target-specific definitions for the control
register bits together with these intrinsic functions provides enough
portable way to handle control modes across wide range of hardware.
This change defines only llvm intrinsic function, which implement the
access required for the aforementioned use cases.
Differential Revision: https://reviews.llvm.org/D82525