This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU
ISD nodes, used to lower scalable vector fp_extend/fp_round operations.
fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one.
This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll,
resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND.
Reviewed By: sdesmalen, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D88321
This is like FastMathFlagGuard in IR. Since we use SDAG instance to get
values, it's with SelectionDAG. By creating a FlagInserter in current
scope, all values created by getNode will get the flags if no Flags
argument provided.
In this patch, I applied it to floating point operations folding part in
DAG combiner, and removed Flags passing to getNode to show its effect.
Other places in DAG combiner and other helper methods similar to getNode
also need this. They can be done in follow-up patches.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D87361
An existing function Type::getScalarSizeInBits returns a uint64_t
instead of a TypeSize class because the caller is requesting a
scalar size, which cannot be scalable. This patch makes other
similar functions requesting a scalar size consistent with that,
thereby eliminating more than 1000 implicit TypeSize -> uint64_t
casts.
Differential revision: https://reviews.llvm.org/D87889
The versions that take 'unsigned' will be removed in the future.
I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D87592
Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1)
but the AND might be separated from the ctpop. For example if the
parity result is multiplied by 2, we'll pull the AND through the
shift.
So to handle more cases, move to SimplifyDemandedBits where we
can handle more cases that result in only the LSB of the CTPOP
being used.
In getMemcpyLoadsAndStores(), a memcpy where the source is a zero constant is expanded to a MemOp::Set instead of a MemOp::Copy, even when the memcpy is volatile.
This is incorrect.
The fix is to add a check for volatile, and expand to MemOp::Copy in the volatile case.
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D87134
I have fixed up some more ElementCount/TypeSize related warnings in
the following tests:
CodeGen/AArch64/sve-split-extract-elt.ll
CodeGen/AArch64/sve-split-insert-elt.ll
In SelectionDAG::CreateStackTemporary we were relying upon the implicit
cast from TypeSize -> uint64_t when calling MachineFrameInfo::CreateStackObject.
I've fixed this by passing in the known minimum size instead, which I
believe is fine because the associated stack id indicates whether this
is a scalable object or not.
I've also fixed up a case in TargetLowering::SimplifyDemandedBits when
extracting a vector element from a scalable vector. The result is a scalar,
hence it wasn't caught at the start of the function. If the vector is
scalable we just bail out for now.
Differential Revision: https://reviews.llvm.org/D86431
Use forward declarations and move the include down to dependent files that actually use it.
This also exposes a number of implicit dependencies on KnownBits.h
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.
I have added a CHECK line to the following test:
CodeGen/AArch64/sve-split-load.ll
that ensures no new warnings are added.
Differential Revision: https://reviews.llvm.org/D86697
Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A)
transformation when -1 is expressed as an ISD::SPLAT_VECTOR.
Differential Revision: https://reviews.llvm.org/D86415
In this patch I have fixed two issues:
1. Our SVE tuple get/set intrinsics were using the wrong constant type
for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the
function SelectionDAG::getVectorIdxConstant to create the value. Also, I
have updated the documentation for EXTRACT_SUBVECTOR describing what type
the constant index should be and we now enforce this when creating the
node.
2. The AArch64 backend was missing the appropriate patterns for
extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types.
I have added them as part of this patch.
The only way that I could find to test the new patterns was to use the
SVE tuple get intrinsics, although I realise it looks a bit unusual.
Tests added here:
test/CodeGen/AArch64/sve-extract-subvector.ll
Differential Revision: https://reviews.llvm.org/D85516
When the result type of insertelement needs to be split,
SplitVecRes_INSERT_VECTOR_ELT will try to store the vector to a
stack temporary, store the element at the location of the stack
temporary plus the index, and reload the Lo/Hi parts.
This patch does the following to ensure this works for scalable vectors:
- Sets the StackID with getStackIDForScalableVectors() in CreateStackTemporary
- Adds an IsScalable flag to getMemBasePlusOffset() and scales the
offset by VScale when this is true
- Ensures the immediate is clamped correctly by clampDynamicVectorIndex
so that we don't try to use an out of range index
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D84874
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85220
As mentioned on D85463, we should be using SimplifyMultipleUseDemandedBits (which is the default fallback).
The minor regression in illegal-bitfield-loadstore.ll will be addressed properly by D77804.
This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.
Differential Revision: https://reviews.llvm.org/D85328
ComputeNumSignBits and computeKnownBits both trigger "Scalable flag
may be dropped" warnings when a fixed length vector is extracted
from a scalable vector. This patch assumes nothing about the
demanded elements thus matching the behaviour when extracting a
scalable vector from a scalable vector.
Differential Revision: https://reviews.llvm.org/D83642
This patch replaces some invalid calls to getVectorNumElements() with calls
to getVectorMinNumElements() instead, since the code paths changed in this
patch work for both fixed and scalable vector types.
Fixes warnings in this test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83203
Summary:
When legalizing a biscast operation from an fp16 operand to an i16 on a
target that requires both input and output types to be promoted to
32-bits, an assertion can fail when building the new node due to a
mismatch between the the operation's result size and the type specified to
the node.
This patches fix the issue by making sure the bit width of the types
match for the FP_TO_FP16 node, covering the difference with an extra
ANYEXTEND operation.
Reviewers: ostannard, efriedma, pirama, jmolloy, plotfi
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82552
Summary:
When splitting a load of a scalable type, the new address is
calculated in SplitVecRes_LOAD using a vscale and an add instruction.
This patch also adds a DAG combiner fold to visitADD for vscale:
- Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1)))
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: david-arm
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82792
Currently matchBinOpReduction only handles shufflevector reduction patterns, but in many cases these only occur in the final stages of a reduction, once we're down to legal vector widths.
Before this its likely that we are performing reductions using subvector extractions to repeatedly split the source vector in half and perform the binop on the halves.
Assuming we've found a non-partial reduction, this patch continues looking for subvector reductions as far as it can beyond the last shufflevector.
Fixes PR37890
Fix a warning in getNode() when extracting a subvector from a
concat vector. We can simply replace the call to getVectorNumElements
with getVectorMinNumElements as this follows the defined behaviour
for EXTRACT_SUBVECTOR.
Differential Revision: https://reviews.llvm.org/D82746
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
where these alignments are retrieved from alignment attributes in LLVM
IR. These tracked alignments could help DAG combining and lowering
generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
we only generate AssertAlign nodes on return values from intrinsic
calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
new (base + offset) patterns.
Reviewers: arsenm, bogner
Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81711
When trying to calculate the number of sign bits for scalable vectors
we should just bail out for now and pretend we know nothing.
Differential Revision: https://reviews.llvm.org/D81093
Instead of asserting the number of elements is the same, we should be
comparing the element counts instead. In addition, when looking at
concats of extract_subvectors it's fine to use getVectorMinNumElements()
for scalable vectors.
I discovered these warnings when compiling the structured loads tests in
this file:
test/CodeGen/AArch64/sve-intrinsics-loads.ll
Differential Revision: https://reviews.llvm.org/D81936
Until we have a real need for computing known bits for scalable
vectors I have simply changed the code to bail out for now and
pretend we know nothing. I've also fixed up some simple callers of
computeKnownBits too.
Differential Revision: https://reviews.llvm.org/D80437
In two instances of CreateStackTemporary we are sometimes promoting
alignments beyond the stack alignment. I have introduced a new function
called getReducedAlign that will return the alignment for the broken
down parts of illegal vector types. For example, on NEON a <32 x i8>
type is made up of two <16 x i8> types - in this case the sensible
alignment is 16 bytes, not 32.
In the legalization code wherever we create stack temporaries I have
started using the reduced alignments instead for illegal vector types.
I added a test to
CodeGen/AArch64/build-one-lane.ll
that tries to insert an element into an illegal fixed vector type
that involves creating a temporary stack object.
Differential Revision: https://reviews.llvm.org/D80370
This wasn't getting much value from the DAG or depth arguments, since
it's only called on the frame index root nodes. FrameIndexes can also
only return a scalar value, so it also didn't need DemandedElts.
The AMDGPU non-strict fdiv lowering needs to introduce an FP mode
switch in some cases, and has custom nodes to provide chain/glue for
the intermediate FP operations. We need to propagate nofpexcept here,
but getNode was dropping the flags.
Adding nofpexcept in the AMDGPU custom lowering is left to a future
patch.
Also fix a second case where flags were dropped, but in this case it
seems it just didn't handle this number of operands.
Test will be included in future AMDGPU patch.
We are calling getValidShiftAmountConstant first followed by getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant if that failed. But both are used in the same way in ComputeNumSignBits and the Min/Max variants call getValidShiftAmountConstant internally anyhow.
Summary:
The description of EXTACT_SUBVECTOR and INSERT_SUBVECTOR has been
changed to accommodate scalable vectors (see ISDOpcodes.h). This
patch updates the asserts used to verify these requirements when
using SelectionDAG's getNode interface.
This patch introduces the MVT function getVectorMinNumElements
that can be used against fixed-length and scalable vectors when
only the known minimum vector length is required.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80709
We should be using getVectorElementCount() to assert that two types
have the same numbers of elements. I encountered the warnings while
compiling this test:
CodeGen/AArch64/sve-intrinsics-ld1.ll
Differential Revision: https://reviews.llvm.org/D80616
I have tried to ensure that SelectionDAG and DAGCombiner do
sensible things for scalable vectors, and added support for a
limited number of simple folds. Codegen support for the vector
extract patterns have also been added to the AArch64 backend.
New vector extract tests have been added here:
CodeGen/AArch64/sve-extract-element.ll
and I have also added new folds using inserts and extracts here:
CodeGen/AArch64/sve-insert-element.ll
Differential Revision: https://reviews.llvm.org/D80208
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.
Differential Revision: https://reviews.llvm.org/D75670