Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)
Changes included in this patch:
- Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
- Added the getCanonicalIndexType function to convert redundant addressing
modes (e.g. scaling is redundant when accessing bytes)
- Tests with 32 & 64-bit scaled & unscaled offsets
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D90941
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).
This is the first in a series of patches which adds support for scalable masked scatters
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D90939
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.
Differential Revision: https://reviews.llvm.org/D90644
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.
Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D89382
As discussed on D90527, we should be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
As discussed on D90527, we should be be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
The refactor to use the KnownBits fixed/min/max constant helpers allows us to hit a couple of cases that we were missing before.
We still need the getValidMinimumShiftAmountConstant case as KnownBits doesn't handle per-element vector cases.
As discussed on D90527, we should be be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
The refactor to use the KnownBits fixed/min/max constant helpers allows us to hit a couple of cases that we were missing before.
We still need the getValidMinimumShiftAmountConstant case as KnownBits doesn't handle per-element vector cases.
Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass.
Differential Revision: https://reviews.llvm.org/D90247
Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method.
I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns.
Differential Revision: https://reviews.llvm.org/D87930
We were previously relying upon the TypeSize comparison operators to
obtain the maximum size of two types, however use of such operators is
being deprecated in favour of making the caller aware that it could
be dealing with scalable vector types. I have changed the code to assert
that the two types have the same scalable property and thus we can
simply take the maximum of the known minimum sizes instead.
Differential Revision: https://reviews.llvm.org/D88563
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MSTORE I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.
I have added a CHECK line to the following test:
CodeGen/AArch64/sve-split-store.ll
that ensures no new warnings are added.
Differential Revision: https://reviews.llvm.org/D86928
In the motivating case from https://llvm.org/PR47517
we create a node that does not get constant folded
before getNegatedExpression is attempted from some
other node, and we crash.
By moving the fold into SelectionDAG::simplifyFPBinop(),
we get the constant fold sooner and avoid the problem.
This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU
ISD nodes, used to lower scalable vector fp_extend/fp_round operations.
fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one.
This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll,
resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND.
Reviewed By: sdesmalen, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D88321
This is like FastMathFlagGuard in IR. Since we use SDAG instance to get
values, it's with SelectionDAG. By creating a FlagInserter in current
scope, all values created by getNode will get the flags if no Flags
argument provided.
In this patch, I applied it to floating point operations folding part in
DAG combiner, and removed Flags passing to getNode to show its effect.
Other places in DAG combiner and other helper methods similar to getNode
also need this. They can be done in follow-up patches.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D87361
An existing function Type::getScalarSizeInBits returns a uint64_t
instead of a TypeSize class because the caller is requesting a
scalar size, which cannot be scalable. This patch makes other
similar functions requesting a scalar size consistent with that,
thereby eliminating more than 1000 implicit TypeSize -> uint64_t
casts.
Differential revision: https://reviews.llvm.org/D87889
The versions that take 'unsigned' will be removed in the future.
I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D87592
Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1)
but the AND might be separated from the ctpop. For example if the
parity result is multiplied by 2, we'll pull the AND through the
shift.
So to handle more cases, move to SimplifyDemandedBits where we
can handle more cases that result in only the LSB of the CTPOP
being used.
In getMemcpyLoadsAndStores(), a memcpy where the source is a zero constant is expanded to a MemOp::Set instead of a MemOp::Copy, even when the memcpy is volatile.
This is incorrect.
The fix is to add a check for volatile, and expand to MemOp::Copy in the volatile case.
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D87134
I have fixed up some more ElementCount/TypeSize related warnings in
the following tests:
CodeGen/AArch64/sve-split-extract-elt.ll
CodeGen/AArch64/sve-split-insert-elt.ll
In SelectionDAG::CreateStackTemporary we were relying upon the implicit
cast from TypeSize -> uint64_t when calling MachineFrameInfo::CreateStackObject.
I've fixed this by passing in the known minimum size instead, which I
believe is fine because the associated stack id indicates whether this
is a scalable object or not.
I've also fixed up a case in TargetLowering::SimplifyDemandedBits when
extracting a vector element from a scalable vector. The result is a scalar,
hence it wasn't caught at the start of the function. If the vector is
scalable we just bail out for now.
Differential Revision: https://reviews.llvm.org/D86431
Use forward declarations and move the include down to dependent files that actually use it.
This also exposes a number of implicit dependencies on KnownBits.h
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.
I have added a CHECK line to the following test:
CodeGen/AArch64/sve-split-load.ll
that ensures no new warnings are added.
Differential Revision: https://reviews.llvm.org/D86697
Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A)
transformation when -1 is expressed as an ISD::SPLAT_VECTOR.
Differential Revision: https://reviews.llvm.org/D86415
In this patch I have fixed two issues:
1. Our SVE tuple get/set intrinsics were using the wrong constant type
for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the
function SelectionDAG::getVectorIdxConstant to create the value. Also, I
have updated the documentation for EXTRACT_SUBVECTOR describing what type
the constant index should be and we now enforce this when creating the
node.
2. The AArch64 backend was missing the appropriate patterns for
extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types.
I have added them as part of this patch.
The only way that I could find to test the new patterns was to use the
SVE tuple get intrinsics, although I realise it looks a bit unusual.
Tests added here:
test/CodeGen/AArch64/sve-extract-subvector.ll
Differential Revision: https://reviews.llvm.org/D85516
When the result type of insertelement needs to be split,
SplitVecRes_INSERT_VECTOR_ELT will try to store the vector to a
stack temporary, store the element at the location of the stack
temporary plus the index, and reload the Lo/Hi parts.
This patch does the following to ensure this works for scalable vectors:
- Sets the StackID with getStackIDForScalableVectors() in CreateStackTemporary
- Adds an IsScalable flag to getMemBasePlusOffset() and scales the
offset by VScale when this is true
- Ensures the immediate is clamped correctly by clampDynamicVectorIndex
so that we don't try to use an out of range index
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D84874
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85220
As mentioned on D85463, we should be using SimplifyMultipleUseDemandedBits (which is the default fallback).
The minor regression in illegal-bitfield-loadstore.ll will be addressed properly by D77804.
This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.
Differential Revision: https://reviews.llvm.org/D85328
ComputeNumSignBits and computeKnownBits both trigger "Scalable flag
may be dropped" warnings when a fixed length vector is extracted
from a scalable vector. This patch assumes nothing about the
demanded elements thus matching the behaviour when extracting a
scalable vector from a scalable vector.
Differential Revision: https://reviews.llvm.org/D83642
This patch replaces some invalid calls to getVectorNumElements() with calls
to getVectorMinNumElements() instead, since the code paths changed in this
patch work for both fixed and scalable vector types.
Fixes warnings in this test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83203
Summary:
When legalizing a biscast operation from an fp16 operand to an i16 on a
target that requires both input and output types to be promoted to
32-bits, an assertion can fail when building the new node due to a
mismatch between the the operation's result size and the type specified to
the node.
This patches fix the issue by making sure the bit width of the types
match for the FP_TO_FP16 node, covering the difference with an extra
ANYEXTEND operation.
Reviewers: ostannard, efriedma, pirama, jmolloy, plotfi
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82552
Summary:
When splitting a load of a scalable type, the new address is
calculated in SplitVecRes_LOAD using a vscale and an add instruction.
This patch also adds a DAG combiner fold to visitADD for vscale:
- Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1)))
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: david-arm
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82792
Currently matchBinOpReduction only handles shufflevector reduction patterns, but in many cases these only occur in the final stages of a reduction, once we're down to legal vector widths.
Before this its likely that we are performing reductions using subvector extractions to repeatedly split the source vector in half and perform the binop on the halves.
Assuming we've found a non-partial reduction, this patch continues looking for subvector reductions as far as it can beyond the last shufflevector.
Fixes PR37890