The reason for this change is to clarify an existing technical
restriction of LLVM: there needs to be a way to implicitly define a type
if there is any way to legally define that type by another means.
The DAG has the same instructions: the signed and unsigned absolute
difference of it's input. For AArch64, they map to uabd and sabd for
Neon and SVE. The Neon and SVE instructions will require custom
patterns.
They are pseudo opcodes and are not imported by the IRTranslator. We
need combines to create them.
PowerPC, ARM, and AArch64 have native instructions.
/// i.e trunc(abs(sext(Op0) - sext(Op1))) becomes abds(Op0, Op1)
/// or trunc(abs(zext(Op0) - zext(Op1))) becomes abdu(Op0, Op1)
For GlobalISel, we are going to write the combines in MIR patterns.
see:
llvm/test/CodeGen/AArch64/abd-combine.ll
- [ ] combine into abd
- [ ] legalize and add td patterns
This adds the `llvm.sincos` intrinsic, legalization, and lowering.
The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).
```
declare { float, float } @llvm.sincos.f32(float %Val)
declare { double, double } @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val)
```
The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
The implementation was missing the fact that `G_EXTRACT_SUBVECTOR`
destination and source vector can be different types.
Also fix a bug in the MIR builder for `G_EXTRACT_SUBVECTOR` to generate
the correct opcode.
Clarify the G_EXTRACT_SUBVECTOR specification.
This is an implementation of the saturating fp to int conversions for
GlobalISel. On AArch64 the converstion instrctions work this way,
producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is
ported from SDAG.
AArch64 has a lot of existing tests for fptosi_sat, covering a wide
range of types. I have tried to make most of them work all at once, but
a few fall back due to other missing features such as f128 handling for
min/max.
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.
The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.
In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.
See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
This re-applies #94241 after fixing buildbot failure, see
https://lab.llvm.org/buildbot/#/builders/51/builds/570
According to standard, `constexpr` variables and `const` variables
initialized with constant expressions can be used in lambdas w/o
capturing - see https://en.cppreference.com/w/cpp/language/lambda.
However, MSVC used on buildkite seems to ignore that rule and does not
allow using such uncaptured variables in lambdas: we have "error C3493:
'Mask16' cannot be implicitly captured because no default capture mode
has been specified" - see
https://buildkite.com/llvm-project/github-pull-requests/builds/73238
Explicitly capturing such a variable, however, makes buildbot fail with
"error: lambda capture 'Mask16' is not required to be captured for this
use [-Werror,-Wunused-lambda-capture]" - see
https://lab.llvm.org/buildbot/#/builders/51/builds/570.
Fix both cases by using `0xffff` value directly instead of giving a name
to it.
Original PR description below.
Depends on #94240.
Define the following pseudos for lowering ptrauth constants in code:
- non-`extern_weak`:
- no GOT load needed: `MOVaddrPAC` - similar to `MOVaddr`, with added
PAC;
- GOT load needed: `LOADgotPAC` - similar to `LOADgot`, with added PAC;
- `extern_weak`: `LOADauthptrstatic` - similar to `LOADgot`, but use a
special stub slot named `sym$auth_ptr$key$disc` filled by dynamic linker
during relocation resolving instead of a GOT slot.
---------
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
Depends on #94240.
Define the following pseudos for lowering ptrauth constants in code:
- non-`extern_weak`:
- no GOT load needed: `MOVaddrPAC` - similar to `MOVaddr`, with added
PAC;
- GOT load needed: `LOADgotPAC` - similar to `LOADgot`, with added PAC;
- `extern_weak`: `LOADauthptrstatic` - similar to `LOADgot`, but use a
special stub slot named `sym$auth_ptr$key$disc` filled by dynamic linker
during relocation resolving instead of a GOT slot.
---------
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
Combiners that use C++ code in their "apply" pattern only use that. They
never mix it with MIR patterns as that has little added value.
This patch restricts C++ apply code so that if C++ is used, we cannot
use MIR patterns or builtins with it. Adding this restriction allows us
to merge calls to match and apply C++ code together, which in turns
makes it so we can just have MatchData variables on the stack.
So before, we would have
```
GIM_CheckCxxInsnPredicate // match
GIM_CheckCxxInsnPredicate // apply
GIR_Done
```
Alongside a massive C++ struct holding the MatchData of all rules
possible (which was a big space/perf issue).
Now we just have
```
GIR_DoneWithCustomAction
```
And the function being ran just does
```
unsigned SomeMatchData;
if (match(SomeMatchData))
apply(SomeMatchData)
```
This approach solves multiple issues in one:
- MatchData handling is greatly simplified and more efficient, "don't
pay for what you don't use"
- We reduce the size of the match table
- Calling C++ code has a certain overhead (we need a switch), and this
overhead is only paid once now.
Handling of C++ code inside PatFrags is unchanged though, that still
emits a `GIM_CheckCxxInsnPredicate`. This is completely fine as they
can't use MatchDatas.
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
If you want an overarching view of how this will all connect see:
https://github.com/llvm/llvm-project/pull/90088
Changes:
- `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN`
opcode
- `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN`
Opcode handler
- `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN`
Opcode
- `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic
to `G_FTAN` Opcode
- `llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp` - Map the
`G_FTAN` opcode to the GLSL 4.5 and openCL tan instructions.
- `llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp` - Define `G_FTAN` as a
legal spirv target opcode.
Previously these were declared as having the 2008 behavior, with
underspecified signed zero handling. Currently, AMDGPU, PPC and
LoongArch mark these as legal. The AMDGPU and PPC instructions respect
the signed zero behavior. The LoongArch documentation doesn't state, but
I'm assuming it also does.
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.
These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.
Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.
G_INSERT and G_EXTRACT are not sufficient to use to represent both
INSERT/EXTRACT on a subregister and INSERT/EXTRACT on a vector.
We would like to be able to INSERT/EXTRACT on vectors in cases that
INSERT/EXTRACT on vector subregisters are not sufficient, so we add
these opcodes.
I tried to do a patch where we treated G_EXTRACT as both
G_EXTRACT_SUBVECTOR and G_EXTRACT_SUBREG, but ran into an infinite loop
at this
[point](8b5b294ec2/llvm/lib/Target/RISCV/RISCVISelLowering.cpp (L9932))
in the SDAG equivalent code.
Recommits llvm/llvm-project#80378 which was reverted in
llvm/llvm-project#84330. The problem was that the change in
llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir used
217 as an opcode instead of a regex.
This patch is stacked on
https://github.com/llvm/llvm-project/pull/80372,
https://github.com/llvm/llvm-project/pull/80307, and
https://github.com/llvm/llvm-project/pull/80306.
ShuffleVector on scalable vector types gets IRTranslate'd to
G_SPLAT_VECTOR since a ShuffleVector that has operates on scalable
vectors is a splat vector where the value of the splat vector is the 0th
element of the first operand, because the index mask operand is the
zeroinitializer (undef and poison are treated as zeroinitializer here).
This is analogous to what happens in SelectionDAG for ShuffleVector.
`buildSplatVector` is renamed to`buildBuildVectorSplatVector`. I did not
make this a separate patch because it would cause problems to revert
that change without reverting this change too.
Also disables generation of MutateOpcode. It's almost never used in
combiners anyway.
If we really want to use it, it needs to be investigated & properly
fixed (see TODO)
Fixes#70780
Use `MachineIRBuilder::buildConstant` to emit typed immediates in
'apply' MIR patterns.
This adds flexibility, e.g. it allows us to seamlessly handle vector
cases, where a `G_BUILD_VECTOR` is needed to create a splat.
Adds a new feature to MIR patterns: builtin instructions.
They offer some additional capabilities that currently cannot be expressed without falling back to C++ code.
There are two builtins added with this patch, but more can be added later as new needs arise:
- GIReplaceReg
- GIEraseRoot
Depends on D158714, D158713
Reviewed By: arsenm, aemerson
Differential Revision: https://reviews.llvm.org/D158975
I had to tighten the restrictions on PatFrags a bit to make it consistent: instructions that
define the root of a PF can only have one def.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D157700
This is a lot of copy-pasting for the existing handling of
G_VECREDUCE_FMAX/G_VECREDUCE_FMIN to add handling for
G_VECREDUCE_FMAXIMUM/G_VECREDUCE_FMINIMUM in the same way.
Differential Revision: https://reviews.llvm.org/D156615
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:
- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS
Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D154766
hoisted constants.
The constant hoisting pass tries to hoist large constants into predecessors and also
generates remat instructions in terms of the hoisted constants. These aim to prevent
codegen from rematerializing expensive constants multiple times. So we can re-use
this optimization, we can preserve the no-op bitcasts that are used to anchor
constants to the predecessor blocks.
SelectionDAG achieves this by having the OpaqueConstant node, which is just a
normal constant with an opaque flag set. I've opted to avoid introducing a new
constant generic instruction here. Instead, we have a new G_CONSTANT_FOLD_BARRIER
operation that constitutes a folding barrier.
These are somewhat like the optimization hints, G_ASSERT_ZEXT in that they're
eliminated by the generic instruction selection code.
This change by itself has very minor improvements in -Os CTMark overall. What this
does allow is better optimizations when future combines are added that rely on having
expensive constants remain unfolded.
Differential Revision: https://reviews.llvm.org/D144336
We currently have a bug where the legalizer, when dealing with phi operands,
may create instructions in the phi's incoming blocks at points which are effectively
dead due to a possible exception throw.
Say we have:
throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
B returnbb
bb:
%v = phi i1 %true, throwbb, %false....
When legalizing we may need to widen the i1 %true value, and to do that we need
to create new extension instructions in the incoming block. Our insertion point
currently is the MBB::getFirstTerminator() which puts the IP before the unconditional
branch terminator in throwbb. These extensions may never be executed if the call
throws, and therefore we need to emit them before the call (but not too early, since
our new instruction may need values defined within throwbb as well).
throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
%true = G_CONSTANT i32 1 ; <<<-- ruh'roh, this never executes if may_throw_call() throws!
B returnbb
bb:
%v = phi i32 %true, throwbb, %false....
To fix this, I've added two new instructions. The main idea is that G_INVOKE_REGION_START
is a terminator, which tries to model the fact that in the IR, the original invoke inst
is actually a terminator as well. By using that as the new insertion point, we
make sure to place new instructions on always executing paths.
Unfortunately we still need to make the legalizer use a new insertion point API
that I've added, since the existing `getFirstTerminator()` method does a reverse
walk up the block, and any non-terminator instructions cause it to bail out. To
avoid impacting compile time for all `getFirstTerminator()` uses, I've added a new
method that does a forward walk instead.
Differential Revision: https://reviews.llvm.org/D137905
Instruction G_IS_FPCLASS had an operand that represented floating-point
semantics of its first operand. It allowed types that have the same length,
like `bfloat16` and `half`, to be distinguished. Unfortunately, it is
not sufficient, as other operation still cannot distinguish such types.
Solution of this problem must be more general, so now this operand is removed.
Differential Revision: https://reviews.llvm.org/D138004
SelectionDAG has a target hook, getExtendForAtomicOps, which it uses
in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty
ugly (as is having a separate load opcode for atomics), so instead
allow making use of atomic zextload. Enable this for AArch64 since the
DAG path defaults in to the zext behavior.
The tablegen changes are pretty ugly, but partially helps migrate
SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with
atomic memory operands. For now the DAG emitter will emit matchers for
patterns which the DAG will not produce.
I'm still a bit confused by the intent of the isLoad/isStore/isAtomic
bits. The DAG implementation rejects trying to use any of these in
combination. For now I've opted to make the isLoad checks also check
isAtomic, although I think having isLoad and isAtomic set on these
makes most sense.
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D127041