2453 Commits

Author SHA1 Message Date
Nikita Popov
9a4453592b [DAGCombine] Fold (x & ~y) | y patterns
Fold (x & ~y) | y and it's four commuted variants to x | y. This pattern
can in particular appear when a vselect c, x, -1 is expanded to
(x & ~c) | (-1 & c) and combined to (x & ~c) | c.

This change has some overlap with D59066, which avoids creating a
vselect of this form in the first place during uaddsat expansion.

Differential Revision: https://reviews.llvm.org/D59174

llvm-svn: 356333
2019-03-17 15:45:38 +00:00
Simon Pilgrim
3b0a6c69ee [DAGCombine] combineShuffleOfScalars - handle non-zero SCALAR_TO_VECTOR indices (PR41097)
rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef.

llvm-svn: 356327
2019-03-16 17:36:26 +00:00
Nirav Dave
ee5183c796 [DAGCombiner] Fix Comment. NFC.
llvm-svn: 356069
2019-03-13 17:44:40 +00:00
Nirav Dave
d6351340bb [DAGCombiner] If a TokenFactor would be merged into its user, consider the user later.
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.

Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.

CodeGen tests with non-reordering changes:

  X86/aligned-variadic.ll -- memory-based add folded into stored leaq
  value.

  X86/constant-combiners.ll -- Optimizes out overlap between stores.

  X86/pr40631_deadstore_elision -- folds constant byte store into
  preceding quad word constant store.

Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet

Reviewed By: courbet

Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59260

llvm-svn: 356068
2019-03-13 17:07:09 +00:00
Clement Courbet
3bb5d0bb9b Re-land r354244 "[DAGCombiner] Eliminate dead stores to stack."
Always check candidates for hasOtherUses(), not only stores.

llvm-svn: 356050
2019-03-13 13:56:23 +00:00
Simon Pilgrim
9f0a5ca843 [DAGCombine] Pull out repeated demanded bitmask generation. NFCI.
llvm-svn: 355932
2019-03-12 15:58:28 +00:00
Nikita Popov
aa7cfa75f9 [SDAG][AArch64] Legalize VECREDUCE
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.

Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.

This also includes a few more changes to make this work somewhat
reasonably:

 * Add support for expanding VECREDUCE in SDAG. Usually
   experimental.vector.reduce is expanded prior to codegen, but if the
   target does have native vector reduce, it may of course still be
   necessary to expand due to legalization issues. This uses a shuffle
   reduction if possible, followed by a naive scalar reduction.
 * Allow the result type of integer VECREDUCE to be larger than the
   vector element type. For example we need to be able to reduce a v8i8
   into an (nominally) i32 result type on AArch64.
 * Use the vector operand type rather than the scalar result type to
   determine the action, so we can control exactly which vector types are
   supported. Also change the legalize vector op code to handle
   operations that only have vector operands, but no vector results, as
   is the case for VECREDUCE.
 * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
   explicitly specify for which vector types the reductions are supported.

This does not handle anything related to VECREDUCE_STRICT_*.

Differential Revision: https://reviews.llvm.org/D58015

llvm-svn: 355860
2019-03-11 20:22:13 +00:00
Amaury Sechet
a135fd5562 Remove redundant extractBooleanFlip argument. NFC
llvm-svn: 355794
2019-03-11 00:37:01 +00:00
Amaury Sechet
b62642a115 Refactor isBooleanFlip into extractBooleanFlip so that users do not depend on the patern matched. NFC
llvm-svn: 355769
2019-03-09 02:51:52 +00:00
Amaury Sechet
782ac933b5 [DAGCombiner] fold (add (add (xor a, -1), b), 1) -> (sub b, a)
Summary: This pattern is sometime created after legalization.

Reviewers: efriedma, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58874

llvm-svn: 355716
2019-03-08 19:39:32 +00:00
Simon Pilgrim
04e8439f72 [DAGCombine] Merge visitSMULO+visitUMULO into visitMULO. NFCI.
llvm-svn: 355690
2019-03-08 11:41:18 +00:00
Simon Pilgrim
c71d6d157f [DAGCombine] Merge visitSADDO+visitUADDO into visitADDO. NFCI.
llvm-svn: 355689
2019-03-08 11:30:33 +00:00
Simon Pilgrim
2c2e76a9e2 [DAGCombine] Merge visitSSUBO+visitUSUBO into visitSUBO. NFCI.
llvm-svn: 355688
2019-03-08 11:16:55 +00:00
Simon Pilgrim
9d6347cfc1 [DAGCombine] Improve select (not Cond), N1, N2 -> select Cond, N2, N1 fold
Move the x86 combine from D58974 into the DAGCombine VSELECT code and update the SELECT version to use the isBooleanFlip helper as well.

Requested by @spatel on D59006

llvm-svn: 355533
2019-03-06 18:52:52 +00:00
Simon Pilgrim
cdf95f8f07 [DAGCombiner] Enable UADDO/USUBO vector combine support
Differential Revision: https://reviews.llvm.org/D58965

llvm-svn: 355517
2019-03-06 16:11:03 +00:00
Simon Pilgrim
1bdc2d1874 [DAGCombiner] Add SADDO/SSUBO combine support
Basic constant handling folds, for both scalars and vectors

Differential Revision: https://reviews.llvm.org/D58967

llvm-svn: 355506
2019-03-06 14:22:21 +00:00
Simon Pilgrim
642f53d292 [DAGCombiner] Enable SMULO/UMULO vector combine support (PR40442)
Differential Revision: https://reviews.llvm.org/D58968

llvm-svn: 355495
2019-03-06 11:04:21 +00:00
Craig Topper
509a8a3cf1 [DAGCombiner][X86][SystemZ][AArch64] Combine some cases of (bitcast (build_vector constants)) between legalize types and legalize dag.
This patch enables combining integer bitcasts of integer build vectors when the new scalar type is legal. I've avoided floating point because the implementation bitcasts float to int along the way and we would need to check the intermediate types for legality

Differential Revision: https://reviews.llvm.org/D58884

llvm-svn: 355324
2019-03-04 19:12:16 +00:00
Nirav Dave
582d46328c [DAG] Fix constant store folding to handle non-byte sizes.
Avoid crashes from zero-byte values due to sub-byte store sizes.

Reviewers: uabelho, courbet, rnk

Reviewed By: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58626

llvm-svn: 354884
2019-02-26 15:02:32 +00:00
Andrea Di Biagio
4a1e59a6e0 Fix a sign compare warning breaking the -Werror build.
The warning was introduced at r354793.

llvm-svn: 354810
2019-02-25 19:33:58 +00:00
Simon Pilgrim
28441ac75f [DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats
Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold

Differential Revision: https://reviews.llvm.org/D58585

llvm-svn: 354793
2019-02-25 16:02:01 +00:00
Jordan Rupprecht
6387fa2715 [NFC] Fix typos: preceeding -> preceding
llvm-svn: 354715
2019-02-23 01:28:32 +00:00
Nirav Dave
46f939c118 Disable big-endian constant store merges from rL354676.
llvm-svn: 354677
2019-02-22 16:20:34 +00:00
Nirav Dave
44037d7a63 [DAGCombine] Fold overlapping constant stores
Fold a smaller constant store into larger constant stores immediately
preceeding it.

Reviewers: rnk, courbet

Subscribers: javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58468

llvm-svn: 354676
2019-02-22 16:00:19 +00:00
Sanjay Patel
ba5ee817e9 [DAGCombiner] prevent infinite looping by truncating 'and' (PR40793)
This fold can occur during legalization, so it can fight with promotion
to the larger type. It apparently takes a special sequence and subtarget
to avoid more basic simplifications that would hide the problem.

But there's a bigger question raised here: why does distributeTruncateThroughAnd()
even exist? It duplicates functionality from a more minimal pattern that we
already have. But getting rid of this function requires some preliminary steps.

https://bugs.llvm.org/show_bug.cgi?id=40793

llvm-svn: 354594
2019-02-21 16:01:48 +00:00
Nirav Dave
48cf37b55c [DAGCombine] Generalize Dead Store to overlapping stores.
Summary:
Remove stores that are immediately overwritten by larger
stores.

Reviewers: courbet, rnk

Reviewed By: rnk

Subscribers: javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58467

llvm-svn: 354518
2019-02-20 21:07:50 +00:00
Clement Courbet
62b3b91ab2 Re-land the refactoring part of r354244 "[DAGCombiner] Eliminate dead stores to stack."
This is an NFC.

llvm-svn: 354476
2019-02-20 15:45:58 +00:00
Clement Courbet
292291fb90 Revert r354244 "[DAGCombiner] Eliminate dead stores to stack."
Breaks some bots.

llvm-svn: 354245
2019-02-18 08:24:29 +00:00
Clement Courbet
57f34dbd3e [DAGCombiner] Eliminate dead stores to stack.
Summary:
A store to an object whose lifetime is about to end can be removed.

See PR40550 for motivation.

Reviewers: niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57541

llvm-svn: 354244
2019-02-18 07:59:01 +00:00
Sanjay Patel
86fac11d5a [DAGCombiner] convert logic-of-setcc into bit magic (PR40611)
If we're comparing some value for equality against 2 constants
and those constants have an absolute difference of just 1 bit,
then we can offset and mask off that 1 bit and reduce to a single
compare against zero:
         and/or (setcc X, C0, ne), (setcc X, C1, ne/eq) -->
         setcc ((add X, -C1), ~(C0 - C1)), 0, ne/eq

https://rise4fun.com/Alive/XslKj

This transform is disabled by default using a TLI hook
("convertSetCCLogicToBitwiseLogic()").

That should be overridden for AArch64, MIPS, Sparc and possibly
others based on the asm shown in:
https://bugs.llvm.org/show_bug.cgi?id=40611

llvm-svn: 353859
2019-02-12 17:07:47 +00:00
Benjamin Kramer
711950c116 Move some classes into anonymous namespaces. NFC.
llvm-svn: 353710
2019-02-11 15:16:21 +00:00
Simon Pilgrim
c5744d4d69 [DAG] Add optional AllowUndefs to isNullOrNullSplat
No change in default behaviour (AllowUndefs = false)

llvm-svn: 353646
2019-02-10 17:42:15 +00:00
Simon Pilgrim
5a82a788a2 [DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts
Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero.

Differential Revision: https://reviews.llvm.org/D58009

llvm-svn: 353645
2019-02-10 17:04:00 +00:00
Nemanja Ivanovic
92a8c36735 [DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X))
The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.

Patch by Whitney Tsang (Whitney)

Differential revision: https://reviews.llvm.org/D57434

llvm-svn: 353557
2019-02-08 19:50:58 +00:00
Simon Pilgrim
478bb90779 [TargetLowering] Add SimplifyDemandedBits funnel shift support
llvm-svn: 353539
2019-02-08 17:19:01 +00:00
Nirav Dave
97011ccce0 Revert r353416 "[DAG] Cleanup unused nodes on failed store-to-load forward combine."
This cleanup causes out-of-tree crashes.

llvm-svn: 353527
2019-02-08 15:21:13 +00:00
Simon Pilgrim
fe3ac70b18 [DAGCombiner] (add (umax X, C), -C) --> (usubsat X, C) (PR40111)
Move the (add (umax X, C), -C) --> (usubsat X, C) X86 combine into generic DAGCombiner

First of a number of saturated arithmetic folds that can be moved out of X86-specific code for PR40111.

Differential Revision: https://reviews.llvm.org/D57754

llvm-svn: 353457
2019-02-07 20:14:43 +00:00
Nirav Dave
9332fc2e19 Revert "[DAG] Cleanup of unused node in SimplifySelectCC."
Causes ASAN use-after-poison errors.

llvm-svn: 353442
2019-02-07 18:31:05 +00:00
Sanjay Patel
2d4b186844 [DAGCombiner] fold add/sub with bool operand based on target's boolean contents
I noticed that we are missing this canonicalization in IR:
rL352515
...and then realized that we don't get this right in SDAG either,
so this has to be fixed first regardless of what we choose to do in IR.

The existing fold was limited to scalars and using the wrong predicate
to guard the transform. We have a boolean contents TLI query that can
be used to decide which direction to fold.

This may eventually lead back to the problems/question in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but it makes no difference to that yet.

Differential Revision: https://reviews.llvm.org/D57401

llvm-svn: 353433
2019-02-07 17:43:34 +00:00
Nirav Dave
24e60819f6 [DAG] Cleanup of unused node in SimplifySelectCC.
llvm-svn: 353428
2019-02-07 17:13:55 +00:00
Nirav Dave
4b12236f7d [DAG] Cleanup unused node on failed SELECT Combine.
llvm-svn: 353426
2019-02-07 16:57:50 +00:00
Nirav Dave
724b81087d [DAG] Cleanup unused nodes on failed store-to-load forward combine.
llvm-svn: 353416
2019-02-07 15:38:14 +00:00
Nirav Dave
b3506bf985 [DAG] Immediately cleanup unused nodes from extend-based combines.
llvm-svn: 353338
2019-02-06 20:12:03 +00:00
Clement Courbet
5a6712b633 [DAGCombine][NFC] GatherAllAliases should take a LSBaseSDNode.
GatherAllAliases only makes sense for LSBaseSDNode. Enforce it with
static typing instead of runtime cast.

llvm-svn: 353291
2019-02-06 12:36:17 +00:00
Craig Topper
d4e37afe45 [DAGCombiner] Discard pointer info when combining extract_vector_elt of a vector load when the index isn't constant
Summary:
If the index isn't constant, this transform inserts a multiply and an add on the index to calculating the base pointer for a scalar load. But we still create a memory operand with an offset of 0 and the size of the scalar access. But the access is really to an unknown offset within the original access size.

This can cause the machine scheduler to incorrectly calculate dependencies between this load and other accesses. In the case we saw, there was a 32 byte vector store that was split into two 16 byte stores, one with offset 0 and one with offset 16. The size of the memory operand for both was 16. The scheduler correctly detected the alias with the offset 0 store, but not the offset 16 store.

This patch discards the pointer info so we don't incorrectly detect aliasing. I wasn't sure if we could keep using the original offset and size without risking some other transform on the load changing the size.

I tried to reduce a test case, but there's still a lot of memory operations needed to get the scheduler to do the bad reordering. So it looked pretty fragile to maintain.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57616

llvm-svn: 353124
2019-02-05 00:22:23 +00:00
Simon Pilgrim
a536b89fe0 [DAGCombine] Add ADD(SUB,SUB) combines
Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case.

We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage.

llvm-svn: 353044
2019-02-04 13:44:49 +00:00
Simon Pilgrim
bd42f97946 [SDAG] Add SDNode/SDValue getConstantOperandAPInt helper. NFCI.
We already have the getConstantOperandVal helper which returns a uint64_t, but along comes the fuzzer and inserts a i128 -1 constant or something and the whole thing asserts.......

I've updated a few obvious cases, and tried to make use of the const reference where possible, but there's more to do. A number of existing oss-fuzz tickets should be fixed if we start using APInt and perform value clamping where necessary.

llvm-svn: 352961
2019-02-02 17:35:06 +00:00
Guozhi Wei
0bed9e0453 [DAGCombine] Avoid CombineZExtLogicopShiftLoad if there is free ZEXT
This patch fixes pr39098.

For the attached test case, CombineZExtLogicopShiftLoad can optimize it to

t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
  t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
    t58: i64 = srl t57, Constant:i8<1>
  t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64

But later visitANDLike transforms it to

t25: i64 = Constant<1099511627775>
t35: i64 = Constant<0>
  t0: ch = EntryToken
t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64
      t61: i32 = truncate t57
    t63: i32 = srl t61, Constant:i8<1>
  t64: i32 = and t63, Constant:i32<524287>
t65: i64 = zero_extend t64
    t58: i64 = srl t57, Constant:i8<1>
  t60: i64 = and t58, Constant:i64<524287>
t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64

And it triggers CombineZExtLogicopShiftLoad again, causes a dead loop.

Both forms should generate same instructions, CombineZExtLogicopShiftLoad generated IR looks cleaner. But it looks more difficult to prevent visitANDLike to do the transform, so I prevent CombineZExtLogicopShiftLoad to do the transform if the ZExt is free.

Differential Revision: https://reviews.llvm.org/D57491

llvm-svn: 352792
2019-01-31 20:46:42 +00:00
Nirav Dave
4061b44057 [DAG] Aggressively cleanup dangling node in CombineZExtLogicopShiftLoad.
While dangling nodes will eventually be pruned when they are
considered, leaving them disables combines requiring single-use.

Reviewers: Carrot, spatel, craig.topper, RKSimon, efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D57520

llvm-svn: 352784
2019-01-31 19:35:14 +00:00
Sanjay Patel
9ab23101a8 [DAGCombiner] sub X, 0/1 --> add X, 0/-1
This extends the existing transform for:
add X, 0/1 --> sub X, 0/-1
...to allow the sibling subtraction fold.

This pattern could regress with the proposed change in D57401.

llvm-svn: 352680
2019-01-30 22:41:35 +00:00