3301 Commits

Author SHA1 Message Date
David Green
2cfb243bcd [DAG] Use isAnyConstantBuildVector. NFC
As suggested from 02f8519502447de, this uses the
isAnyConstantBuildVector method in lieu of separate
isBuildVectorOfConstantSDNodes calls. It should
otherwise be an NFC.
2022-05-09 14:13:03 +01:00
David Green
02f8519502 [DAG] Prevent infinite loop combining bitcast shuffle
This prevents an infinite loop from D123801, where code trying to reduce
the total number of bitcasts, but also handling constants, could create
the opposite transform. Prevent the transform in these case to let the
bitcast of a constant transform naturally.

Fixes #55345
2022-05-09 09:36:22 +01:00
Simon Pilgrim
800d36cf32 [DAG] Only perform the fold (A-B)+(C-D) --> (A+C)-(B+D) when both inner subs have one use
Fixes #51381
2022-05-08 13:51:58 +01:00
Amaury Séchet
06fad8bc05 [DAGCombine] Add node in the worklist in topological order in CombineTo
This is part of an ongoing effort toward making DAGCombine process the nodes in topological order.

This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124743
2022-05-07 16:24:31 +00:00
Paul Walker
702c4ade22 [ISD::IndexType] Helper functions for common queries.
Add helper functions to query the signed and scaled properties
of ISD::IndexType along with functions to change them.

Remove setIndexType from MaskedGatherSDNode because it only has
one usage and typically should only be changed alongside its
index operand.

Minimise the direct use of the enum values to lay the groundwork
for more refactoring.

Differential Revision: https://reviews.llvm.org/D123347
2022-05-07 11:23:42 +01:00
David Green
5930691ee1 Revert "[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific"
This reverts commit 891c3cf99e100e8871aff9a0747c887a5d0a8b0f as it turns
out that the error was not caused by this commit, the error caming
from D124526 instead.
2022-05-06 21:03:22 +01:00
David Green
891c3cf99e [DAGCombine] Make combineShuffleOfBitcast LittleEndian specific
Something is going wrong with the BigEndian PowerPC bot. It is hard to
tell what is wrong from here, but attempt to fix it by disabling the
combineShuffleOfBitcast combine for bigendian.
2022-05-06 18:42:44 +01:00
Simon Pilgrim
c0bebc12f0 [DAG] visitREM - merge buildOptimizedSREM into if(). NFCI. 2022-05-06 15:39:17 +01:00
David Green
115c188807 [DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask'))
If the mask is made up of elements that form a mask in the higher type
we can convert shuffle(bitcast into the bitcast type, simplifying the
instruction sequence. A v4i32 2,3,0,1 for example can be treated as a
1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load
combines, along with helping simplify a number of other tests.

The PowerPC combine for v16i8 splat vector loads needed some fixes to
keep it working for v16i8 vectors. This improves the handling of v2i64
shuffles to match too, hopefully improving them in general.

Differential Revision: https://reviews.llvm.org/D123801
2022-05-06 10:50:31 +01:00
Craig Topper
4e2d1a6c18 [DAGCombiner] Fold (sext/zext undef) -> 0 and aext(undef) -> undef.
Differential Revision: https://reviews.llvm.org/D124988
2022-05-05 09:34:18 -07:00
Craig Topper
fd13192aa5 [DAGCombiner] Fold (max/min X, X) -> X.
Differential Revision: https://reviews.llvm.org/D124951
2022-05-05 09:34:17 -07:00
Nikita Popov
9678936f18 [DAGCombine] Fold (X & ~Y) | Y with truncated not
This extends the (X & ~Y) | Y to X | Y fold to also work if ~Y is
a truncated not (when taking into account the mask X). This is
done by exporting the infrastructure added in D124856 and reusing
it here.

I've retained the old value of AllowUndefs=false, though probably
this can be switched to true with extra test coverage.

Differential Revision: https://reviews.llvm.org/D124930
2022-05-05 11:10:11 +02:00
Simon Pilgrim
faa35fc873 [DAG] Fix issue with rot(rot(x,c1),c2) -> rot(x,c1+c2) fold with unnormalized rotation amounts
Don't assume the rotation amounts have been correctly normalized - do it as part of the constant folding.

Also, the normalization should be performed with UREM not SREM.
2022-05-03 17:16:26 +01:00
Craig Topper
5f057eaa0d [DAGCombiner] reassociationCanBreakAddressingModePattern should check uses of the outer add.
When looking for memory uses,
reassociationCanBreakAddressingModePattern should check uses of
the outer ADD rather than the inner ADD. We want to know if the
two ops we're reassociating are used by a load/store.

In practice, the existing check usually works because CodeGenPrepare
will make one of the load/stores have an offset of 0 relative to
split GEP. That will make the inner add have a memory use.

To test this, I've manually split the GEPs so there is no 0 offset
store.

This issue was recently discussed in the original review D60294.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124644
2022-05-02 16:38:53 -07:00
Sanjay Patel
747c6a0c73 [SDAG] fix miscompile when casting int->FP->int
This is the codegen equivalent of D124692.

As shown in https://github.com/llvm/llvm-project/issues/55150 -
the existing fold may be wrong when converting to a signed value.
This is a quick fix to avoid the miscompile.
https://alive2.llvm.org/ce/z/KtaDmd

Differential Revision: https://reviews.llvm.org/D124771
2022-05-02 14:57:27 -04:00
Simon Pilgrim
ae8b10e543 [DAG] (style) Break apart if-else chain as they all return 2022-05-01 17:56:59 +01:00
Craig Topper
6affe87bda [DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask.
We try to match as a disguised rotate by constant of these forms
(shl (X | Y), C1) | (srl X, C2) --> (rotl X, C1) | (shl Y, C1)
(shl X, C1) | (srl (X | Y), C2) --> (rotl X, C1) | (srl Y, C2)

We may have also looked through an AND to find the shift. If we
did, we need to apply a mask to the result.

I'll add an AArch64 test and pre-commit it and the RISC-V test
tomorrow.

Fixes PR55201.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124711
2022-04-30 11:02:30 -07:00
Paul Walker
23c509754d [DAGCombiner] Stop invalid sign conversion in refineIndexType.
When looking through extends of gather/scatter indices it's safe
to convert a known positive signed index to unsigned, but unsigned
indices must remain unsigned.

Depends On D123318

Differential Revision: https://reviews.llvm.org/D123326
2022-04-29 14:20:13 +01:00
Paul Walker
7a0b897e86 [DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling
refineUniformBase and selectGatherScatterAddrMode both attempt the
transformation:

  base(0) + index(A+splat(B)) => base(B) + index(A)

However, this is only safe when index is not implicitly scaled.

Differential Revision: https://reviews.llvm.org/D123222
2022-04-29 12:35:16 +01:00
Simon Pilgrim
34e7243464 [DAG] Fold freeze(bitcast(x)) -> bitcast(freeze(x))
This is a very specific fold to fix an upstream poor codegen issue.

InstCombine has the much more flexible pushFreezeToPreventPoisonFromPropagating but I don't think we're quite there with DAG/TLI handling for canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison value tracking yet.

Fixes #54911

Differential Revision: https://reviews.llvm.org/D124185
2022-04-22 16:39:25 +01:00
Alexey Bataev
2cca53c815 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 09:37:16 -07:00
Alexey Bataev
5f7ac15912 Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer."
This reverts commit 2f49163b3365e5dc046b03e422a048dd45aee3f0 to fix
a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284
2022-04-20 06:35:55 -07:00
Alexey Bataev
2f49163b33 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 05:32:56 -07:00
chenglin.bi
222adf338a [Arch64][SelectionDAG] Add target-specific implementation of srem
1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first.
2. Add AArch64 faster path for SREM only pow2 case.

Fix https://github.com/llvm/llvm-project/issues/54649

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122968
2022-04-19 02:49:42 +08:00
chenglin.bi
acfc025a72 Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem"
This reverts commit 9d9eddd3dde46751a5c415b7e5e475b4feb76600.
2022-04-18 10:35:09 +08:00
chenglin.bi
9d9eddd3dd [Arch64][SelectionDAG] Add target-specific implementation of srem
X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case.

Fix https://github.com/llvm/llvm-project/issues/54649

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122968
2022-04-16 12:29:11 +08:00
Craig Topper
c6dc229a6d [DAGCombiner] Move call to hasOneUse after opcode checks. NFC
Checking the opcode is cheap, counting the number of uses is not.
2022-04-15 17:02:16 -07:00
Craig Topper
a7b9d75e7a [DAGCombiner] Move or/xor/and opcode check in ReduceLoadOpStoreWidth before hasOneUse check.
hasOneUse is not cheap on nodes with chain results that might have
many uses. By checking the opcode first, we can avoid a costly walk
of the use list on nodes we aren't interested in.

Found by investigating calls to hasNUsesOfValue from the example
provided in D123857.
2022-04-15 16:38:27 -07:00
Simon Pilgrim
fef221bf1f [DAG] Enable SimplifyVBinOp folds on add/sub sat intrinsics 2022-04-13 12:53:23 +01:00
Simon Pilgrim
cfb3ee2185 [DAG] Add non-uniform vector support to (shl (srl x, c1), c2) -> (and (shift x, c3))
Another part of D77804 yak shaving

Differential Revision: https://reviews.llvm.org/D123523
2022-04-13 11:37:33 +01:00
Simon Pilgrim
bc32a1dd76 [DAG] Add non-uniform vector support to (shl (sr[la] exact X, C1), C2) folds 2022-04-12 12:57:56 +01:00
Craig Topper
35be4a7af3 [SelectionDAG] Remove unecessary null check after call to getNode. NFC
As far as I know getNode will never return a null SDValue.

I'm guessing this was modeled after the FoldConstantArithmetic
call earlier.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D123550
2022-04-11 18:03:44 -07:00
Craig Topper
5b5f59428c [DAGCombiner] Replace call getSExtOrTrunc with a truncate. NFC
The extend case should never occur. The sign extend would be an
arbitrary choice, remove it to avoid confusion.
2022-04-06 09:59:45 -07:00
Paul Walker
7d3af9ef0f [DAGCombine] insert_subvector undef, (splat X), N2 -> splat X
Differential Revision: https://reviews.llvm.org/D120328
2022-04-06 17:15:38 +01:00
zhongyunde
19e5235147 [AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD
Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD
because it relies on BuildVectorSDNode is fails for scalable vectors.
This patch is intend to handle that, so we can circle back the type MVT::nxv2i32

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D122703
2022-04-06 20:50:42 +08:00
Simon Pilgrim
3369e474bb [DAG] Allow XOR(X,MIN_SIGNED_VALUE) to perform AddLike folds
As raised on PR52267, XOR(X,MIN_SIGNED_VALUE) can be treated as ADD(X,MIN_SIGNED_VALUE), so let these cases use the 'AddLike' folds, similar to how we perform no-common-bits OR(X,Y) cases.

define i8 @src(i8 %x) {
  %r = xor i8 %x, 128
  ret i8 %r
}
=>
define i8 @tgt(i8 %x) {
  %r = add i8 %x, 128
  ret i8 %r
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/qV46E2

Differential Revision: https://reviews.llvm.org/D122754
2022-04-06 10:37:11 +01:00
Sanjay Patel
e18cc5277f [SDAG] try to canonicalize logical shift after bswap
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C

This is the backend version of D122010 and an alternative
suggested in D120648.
There's an extra check to make sure the shift amount is
valid that was not in the rough draft.

I'm not sure if there is a larger motivating case for RISCV (bug report?),
but the ARM diffs show a benefit from having a late version of the
transform (because we do not combine the loads in IR).

Differential Revision: https://reviews.llvm.org/D122655
2022-03-30 09:29:32 -04:00
Craig Topper
e68257fcee [RISCV][SelectionDAG] Enable TargetLowering::hasBitTest for masks that fit in ANDI.
Modified DAGCombiner to pass the shift the bittest input and the shift amount
to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h

This is an alternative to D122454.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D122458
2022-03-28 12:46:36 -07:00
Simon Pilgrim
e209190c2d [SDAG] enable binop identity constant folds for multiplies
Add mul to the list of ops that we canonicalize with a select to expose an identity merge

Differential Revision: https://reviews.llvm.org/D122071
2022-03-25 11:07:04 +00:00
zhongyunde
828b89bc0b [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads
Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.

Both of normal and masked loads test cases added to guard compile crash.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D120953
2022-03-21 23:47:33 +08:00
Simon Pilgrim
35a7be6ccb [SDAG] enable binop identity constant folds for shifts
Add shl/srl/sra to the list of ops that we canonicalize with a select to expose an identity merge

Differential Revision: https://reviews.llvm.org/D122070
2022-03-21 13:02:50 +00:00
Luo, Yuanke
10bb623192 enable binop identity constant folds for add
Differential Revision: https://reviews.llvm.org/D119654
2022-03-20 19:07:16 +08:00
Craig Topper
ad94dfb9a0 [DAGCombiner][RISCV] Adjust (aext (and (trunc x), cst)) -> (and x, cst) to sext cst based on target preference
RISCV strong prefers i32 values be sign extended to i64. This combine
was always zero extending the constant using APInt methods.

This adjusts the code so that it calls getNode using ISD::ANY_EXTEND instead.
getNode will call TLI.isSExtCheaperThanZExt to decide how to handle
the constant.

Tests were copied from D121598 where I noticed that we were creating
constants that were hard to materialize.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121650
2022-03-15 08:26:47 -07:00
Sanjay Patel
c2592c374e [SDAG] simplify bitwise logic with repeated operand
We do not have general reassociation here (and probably
do not need it), but I noticed these were missing in
patches/tests motivated by D111530, so we can at
least handle the simplest patterns.

The VE test diff looks correct, but we miss that
pattern in IR currently:
https://alive2.llvm.org/ce/z/u66_PM
2022-03-13 11:12:30 -04:00
serge-sans-paille
ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Sanjay Patel
341623653d [SDAG] match rotate pattern with extra 'or' operation
This is another fold generalized from D111530.
We can find a common source for a rotate operation hidden inside an 'or':
https://alive2.llvm.org/ce/z/9pV8hn

Deciding when this is profitable vs. a funnel-shift is tricky, but this
does not show any regressions: if a target has a rotate but it does not
have a funnel-shift, then try to form the rotate here. That is why we
don't have x86 test diffs for the scalar tests that are duplicated from
AArch64 ( 74a65e3834d9487 ) - shld/shrd are available. That also makes it
difficult to show vector diffs - the only case where I found a diff was
on x86 AVX512 or XOP with i64 elements.

There's an additional check for a legal type to avoid a problem seen
with x86-32 where we form a 64-bit rotate but then it gets split
inefficiently. We might avoid that by adding more rotate folds, but
I didn't check to see what is missing on that path.

This gets most of the motivating patterns for AArch64 / ARM that are in
D111530.

We still need a couple of enhancements to setcc pattern matching with
rotate/funnel-shift to get the rest.

Differential Revision: https://reviews.llvm.org/D120933
2022-03-09 13:19:00 -05:00
David Green
4388f4f776 [DAG] Don't convert undef to 0 when creating buildvector
When inserting undef into buildvectors created from shuffles of
buildvectors, we convert elements to the largest needed type. This had
the effect of converting undef into 0, which isn't needed as the
buildvector implicitly truncates and trunc(zext(undef)) == undef.

Differential Revision: https://reviews.llvm.org/D121002
2022-03-06 18:35:34 +00:00
Sanjay Patel
f4b53972ce [SDAG] fold bitwise logic with shifted operands
This extends acb96ffd149d to 'and' and 'xor' opcodes.

Copying from that message:

LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z

https://alive2.llvm.org/ce/z/QmR9rR

This is a reassociation + factoring fold. The common shift operation is moved
after a bitwise logic op on 2 input operands.
We get simpler cases of these patterns in IR, but I suspect we would miss all
of these exact tests in IR too. We also handle the simpler form of this plus
several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().
2022-03-05 11:14:45 -05:00
Paul Walker
42b4a6227e [DAGCombine] Prevent illegal ISD::SPLAT_VECTOR operations post legalisation.
When triggered during operation legalisation the affected combine
generates a splat_vector that when custom lowered for SVE fixed
length code generation, results in the original precombine sequence
and thus we enter a legalisation/combine hang.

NOTE: The patch contains no tests because I observed this issue
only when combined with other work that might never become public.
The current way AArch64 lowers ISD::SPLAT_VECTOR meant a specific
test was not possible so I'm hoping the DAGCombiner fix can be seen
as obvious. The AArch64ISelLowering change is requirted to maintain
existing code quality.

Differential Revision: https://reviews.llvm.org/D120735
2022-03-04 11:54:03 +00:00
Craig Topper
bf8054644d [DAGCombiner] Don't expand (neg (abs x)) if the abs has an additional user.
If the types aren't legal, the expansions may get type legalized in a
different way preventing code sharing. If the type is legal, we will
share some instructions between the two expansions, but we will need an
extra register.

Since we don't appear to fold (neg (sub A, B)) if the sub has an
additional user, I think it makes sense not to expand NABS.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120513
2022-03-01 07:32:07 -08:00