2715 Commits

Author SHA1 Message Date
Amaury Séchet
c594d14d40 [DAGCombine] Factor oplist operations. NFC 2019-12-02 19:12:03 +01:00
Amaury Séchet
d8d5106225 [SelectionDAG] Reduce assumptions made about levels. NFC 2019-12-02 17:43:13 +01:00
Amaury Séchet
ca818f4550 [DAGCombiner] Peek through vector concats when trying to combine shuffles.
Summary: This combine showed up as needed when exploring the regression when processing the DAG in topological order.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68195
2019-11-28 23:57:29 +01:00
David Green
b5315ae8ff [Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.

To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.

This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
   Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
   legality of masked operations as well as normal ones. This array is
   fairly small, so doubling the size still won't make it very large.
   Offset masked loads can then be controlled with
   setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
   CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
   the same way.
- The ARM backend is then adjusted to make use of these indexed masked
   loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.

Differential Revision: https://reviews.llvm.org/D70176
2019-11-26 16:21:01 +00:00
Sanjay Patel
214683f3b2 [DAGCombiner] avoid crash on out-of-bounds insert index (PR44139)
We already have this simplification at node-creation-time, but
the test from:
https://bugs.llvm.org/show_bug.cgi?id=44139
...shows that we can combine our way to an assert/crash too.
2019-11-25 16:24:06 -05:00
Clement Courbet
cb15ba84fe Reland "[DAGCombiner] Allow zextended load combines."
Check that the generated type is simple.
2019-11-22 14:47:18 +01:00
Clement Courbet
88e205525c Revert "[DAGCombiner] Allow zextended load combines."
Breaks some bots.
2019-11-22 09:01:08 +01:00
Clement Courbet
036790f988 [DAGCombiner] Allow zextended load combines.
Summary: or(zext(load8(base)), zext(load8(base+1)) -> zext(load16 base)

Reviewers: apilipenko, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70487
2019-11-22 08:40:19 +01:00
Hiroshi Yamauchi
52e377497d [PGO][PGSO] DAG.shouldOptForSize part.
Summary:
(Split of off D67120)

SelectionDAG::shouldOptForSize changes for profile guided size optimization.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70095
2019-11-21 14:16:00 -08:00
Clement Courbet
252567377c [DAGCombine][NFC] Use ArrayRef and correctly size SmallVectors.
In preparation for D70487.
2019-11-21 08:53:37 +01:00
David Zarzycki
257acbf6ae
[SelectionDAG] Combine U{ADD,SUB}O diamonds into {ADD,SUB}CARRY
Summary:
Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR.

Work remaining: match ADD, etc.

Reviewers: craig.topper, RKSimon, spatel, niravd, jonpa, uweigand, deadalnix, nikic, lebedev.ri, dmgreen, chfast

Reviewed By: lebedev.ri

Subscribers: chfast, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70079
2019-11-20 16:25:42 +02:00
Matt Arsenault
7fe9435dc8 Work on cleaning up denormal mode handling
Cleanup handling of the denormal-fp-math attribute. Consolidate places
checking the allowed names in one place.

This is in preparation for introducing FP type specific variants of
the denormal-fp-mode attribute. AMDGPU will switch to using this in
place of the current hacky use of subtarget features for the denormal
mode.

Introduce a new header for dealing with FP modes. The constrained
intrinsic classes define related enums that should also be moved into
this header for uses in other contexts.

The verifier could use a check to make sure the denorm-fp-mode
attribute is sane, but there currently isn't one.

Currently, DAGCombiner incorrectly asssumes non-IEEE behavior by
default in the one current user. Clang must be taught to start
emitting this attribute by default to avoid regressions when this is
switched to assume ieee behavior if the attribute isn't present.
2019-11-19 22:01:14 +05:30
Matt Arsenault
b696b9dba7 DAG: Add function context to isFMAFasterThanFMulAndFAdd
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.

AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
2019-11-19 19:25:26 +05:30
Graham Hunter
3f08ad611a [SVE][CodeGen] Scalable vector MVT size queries
* Implements scalable size queries for MVTs, split out from D53137.

* Contains a fix for FindMemType to avoid using scalable vector type
  to contain non-scalable types.

* Explicit casts for several places where implicit integer sign
  changes or promotion from 32 to 64 bits caused problems.

* CodeGenDAGPatterns will treat scalable and non-scalable vector types
  as different.

Reviewers: greened, cameron.mcinally, sdesmalen, rovka

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D66871
2019-11-18 12:30:59 +00:00
Paweł Bylica
1c247dd028
[DAGCombiner] Drop redundant DAG method param. NFC 2019-11-14 14:02:53 +01:00
Paweł Bylica
9b89bda517
[DAGCombiner] Use TLI field already available. NFC 2019-11-14 14:02:52 +01:00
joanlluch
d384ad6b63 [TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (4)
Summary:
Replaces
```
unsigned getShiftAmountThreshold(EVT VT)
```
by

```
bool shouldAvoidTransformToShift(EVT VT, unsigned amount)
```
thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not.

Updates the MSP430 target with a custom implementation.

This continues  D69116, D69120, D69326 and updates them, so all of them must be committed before this.

Existing tests apply, a few more have been added.

Reviewers: asl, spatel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70042
2019-11-13 09:23:08 +01:00
Philip Reames
db036ee0a4 [X86/Atomics] Correct a few transforms for new atomic lowering
This is a partial fix for the issues described in commit message of 027aa27 (the revert of G24609).  Unfortunately, I can't provide test coverage for it on it's own as the only (known) wrong example is still wrong, but due to a separate issue.

These fixes are cases where when performing unrelated DAG combines, we were dropping the atomicity flags entirely.
2019-11-05 13:20:08 -08:00
Thomas Preud'homme
646896a442 Fix PR40644: miscompile indexed FP constant store
Summary:
Functions replaceStoreOfFPConstant() and OptimizeFloatStore() both
replace store of float by a store of an integer unconditionally. However
this generates wrong code when the store that is replaced is an indexed
or truncating store. This commit solves this issue by adding an early
return in these functions when the store being considered is not a
normal store.

Bug was only observed on out of tree targets, hence the lack of testcase
in this commit.

Reviewers: efriedma

Subscribers: hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68420
2019-11-05 11:07:52 +00:00
Sanjay Patel
113181e9bd [DAGCombine][MSP430] use shift amount threshold in DAGCombine (2/2)
Continuation of:
D69116

Contributes to a fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559

See also D69099 and D69116

Use the TLI hook in DAGCombine.cpp to guard against creating
shift nodes that are not optimal for a target.

Patch by: @joanlluch (Joan LLuch)

Differential Revision: https://reviews.llvm.org/D69120
2019-11-04 13:41:41 -05:00
Matt Arsenault
6221767055 DAG: Add DAG argument to isFPExtFoldable
For AMDGPU this is dependent on the FP mode, which should eventually
not be a property of the subtarget.
2019-10-31 22:32:45 -07:00
Matt Arsenault
1725f28841 DAG: Add new control for ISD::FMAD formation
For AMDGPU this depends on whether denormals are enabled in the
default FP mode for the function. Currently this is treated as a
subtarget feature, so FMAD is selectively legal based on that. I want
to move this out of the subtarget features so this can be controlled
with a denormal mode attribute. Additionally, this will allow folding
based on a future ftz fast math flag.
2019-10-31 07:51:38 -07:00
Sanjay Patel
1ebd4a2e3a [DAGCombiner] widen any_ext of popcount based on target support
This enhances D69127 (rGe6c145e0548e3b3de6eab27e44e1504387cf6b53)
to handle the looser "any_extend" cast in addition to zext.

This is a prerequisite step for canonicalizing in the other direction
(narrow the popcount) in IR - PR43688:
https://bugs.llvm.org/show_bug.cgi?id=43688
2019-10-28 10:07:12 -04:00
Kerry McLaughlin
da720a38b9 [AArch64][SVE] Implement masked load intrinsics
Summary:
Adds support for codegen of masked loads, with non-extending,
zero-extending and sign-extending variants.

Reviewers: huntergr, rovka, greened, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68877
2019-10-28 10:06:14 +00:00
Sanjay Patel
85a2146c15 [SDAG] fold insert_vector_elt with undef index
Similar to:
rG4c47617627fb

This makes the DAG behavior consistent with IR's insertelement.

https://bugs.llvm.org/show_bug.cgi?id=42689

I've tried to maintain test intent for AArch64 and WebAssembly
by replacing undef index operands with something else.
2019-10-27 15:28:43 -04:00
Sanjay Patel
e6c145e054 [DAGCombiner] widen zext of popcount based on target support
zext (ctpop X) --> ctpop (zext X)

This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688:
https://bugs.llvm.org/show_bug.cgi?id=43688

I'm not sure if any other targets are affected, but I found a missing fold for PPC, so added tests based on that.
The reason we widen all the way to 64-bit in these tests is because the initial DAG looks something like this:

  t5: i8 = ctpop t4
  t6: i32 = zero_extend t5  <-- created based on IR, but unused node?
    t7: i64 = zero_extend t5

Differential Revision: https://reviews.llvm.org/D69127
2019-10-25 14:10:51 -04:00
Simon Pilgrim
a18818207a Fix cppcheck shadow variable warning. NFCI. 2019-10-24 22:14:36 +01:00
Graham Hunter
84da2596f9 [AArch64][SVE] Add SPLAT_VECTOR ISD Node
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.

Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.

At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.

I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.

Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy

Reviewed By: jmolloy

Differential Revision: https://reviews.llvm.org/D47775

llvm-svn: 375222
2019-10-18 11:48:35 +00:00
Sam Parker
39af8a3a3b [DAGCombine][ARM] Enable extending masked loads
Add generic DAG combine for extending masked loads.

Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

Differential Revision: https://reviews.llvm.org/D68337

llvm-svn: 375085
2019-10-17 07:55:55 +00:00
David Zarzycki
59390efef2 [X86] Make memcmp() use PTEST if possible and also enable AVX1
llvm-svn: 374922
2019-10-15 17:40:12 +00:00
Sanjay Patel
d545c9056e [DAGCombiner] fold select-of-constants based on sign-bit test
Examples:
  i32 X > -1 ? C1 : -1 --> (X >>s 31) | C1
  i8 X < 0 ? C1 : 0 --> (X >>s 7) & C1

This is a small generalization of a fold requested in PR43650:
https://bugs.llvm.org/show_bug.cgi?id=43650

The sign-bit of the condition operand can be used as a mask for the true operand:
https://rise4fun.com/Alive/paT

Note that we already handle some of the patterns (isNegative + scalar) because
there's an over-specialized, yet over-reaching fold for that in foldSelectCCToShiftAnd().
It doesn't use any TLI hooks, so I can't easily rip out that code even though we're
duplicating part of it here. This fold is guarded by TLI.convertSelectOfConstantsToMath(),
so it should not cause problems for targets that prefer select over shift.

Also worth noting: I thought we could generalize this further to include the case where
the true operand of the select is not constant, but Alive says that may allow poison to
pass through where it does not in the original select form of the code.

Differential Revision: https://reviews.llvm.org/D68949

llvm-svn: 374902
2019-10-15 15:23:57 +00:00
Sanjay Patel
3b581ac80f [DAGCombiner] fold vselect-of-constants to shift
The diffs suggest that we are missing some more basic
analysis/transforms, but this keeps the vector path in
sync with the scalar (rL374397). This is again a
preliminary step for introducing the reverse transform
in IR as proposed in D63382.

llvm-svn: 374555
2019-10-11 14:17:56 +00:00
Sanjay Patel
7b904ce724 [DAGCombiner] fold select-of-constants to shift
This reverses the scalar canonicalization proposed in D63382.

Pre: isPowerOf2(C1)
%r = select i1 %cond, i32 C1, i32 0
=>
%z = zext i1 %cond to i32
%r = shl i32 %z, log2(C1)

https://rise4fun.com/Alive/Z50

x86 already tries to fold this pattern, but it isn't done
uniformly, so we still see a diff. AArch64 probably should
enable the TLI hook to benefit too, but that's a follow-on.

llvm-svn: 374397
2019-10-10 17:52:02 +00:00
Sanjay Patel
7f0e7c0b1c [DAGCombiner] reduce code duplication; NFC
llvm-svn: 374370
2019-10-10 15:38:29 +00:00
Amaury Sechet
aaf0507896 [DAGCombine] Match more patterns for half word bswap
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68250

llvm-svn: 374340
2019-10-10 13:20:10 +00:00
Philip Reames
931120846e Conservatively add volatility and atomic checks in a few places
As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86.

As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them.

This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions.

Differential Revision: https://reviews.llvm.org/D68419

llvm-svn: 374261
2019-10-09 23:43:33 +00:00
Simon Pilgrim
b4ba3cbda0 [X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217)
If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.

This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.

Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.

Fixes PR43217

Differential Revision: https://reviews.llvm.org/D68544

llvm-svn: 373871
2019-10-06 21:11:45 +00:00
Sanjay Patel
f643fabb52 Revert [DAGCombine] Match more patterns for half word bswap
This reverts r373850 (git commit 25ba49824d2d4f2347b4a7cb1623600a76ce9433)

This patch appears to cause multiple codegen regression test failures - http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10680

llvm-svn: 373853
2019-10-06 15:27:34 +00:00
Amaury Sechet
25ba49824d [DAGCombine] Match more patterns for half word bswap
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68250

llvm-svn: 373850
2019-10-06 14:14:55 +00:00
Sanjay Patel
288079aafd [DAGCombiner] add operation legality checks before creating shift ops (PR43542)
As discussed on llvm-dev and:
https://bugs.llvm.org/show_bug.cgi?id=43542
...we have transforms that assume shift operations are legal and transforms to
use them are profitable, but that may not hold for simple targets.

In this case, the MSP430 target custom lowers shifts by repeating (many)
simpler/fixed ops. That can be avoided by keeping this code as setcc/select.

Differential Revision: https://reviews.llvm.org/D68397

llvm-svn: 373666
2019-10-03 21:34:04 +00:00
Simon Pilgrim
3c912c4abe [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks.

The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants).

Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.).

Partial reversion of rL372756 - I've identified the infinite loop issue inside the X86 override but haven't fixed it yet so I've only (re)committed the common TargetLowering refactoring part of the patch.

Differential Revision: https://reviews.llvm.org/D67557

llvm-svn: 373343
2019-10-01 15:32:04 +00:00
Amaury Sechet
e6f98c0073 [DAGCombiner] Clang format MatchRotate. NFC
llvm-svn: 373269
2019-09-30 21:41:52 +00:00
Amaury Sechet
496c0564f1 [DAGCombiner] Update MatchRotate so that it returns an SDValue. NFC
llvm-svn: 373260
2019-09-30 20:47:23 +00:00
Thomas Raoux
3c8c667235 [TargetLowering] Make allowsMemoryAccess methode virtual.
Rename old function to explicitly show that it cares only about alignment.
The new allowsMemoryAccess call the function related to alignment by default
and can be overridden by target to inform whether the memory access is legal or
not.

Differential Revision: https://reviews.llvm.org/D67121

llvm-svn: 372935
2019-09-26 00:16:01 +00:00
Sanjay Patel
831a7e7068 [DAGCombiner] add one-use restriction to vector transform with cheap extract
We might be able to do better on the example in the test,
but in general, we should not scalarize a splatted vector
binop if there are other uses of the binop. Otherwise, we
can end up with code as we had - a scalar op that is
redundant with a vector op.

llvm-svn: 372886
2019-09-25 15:08:33 +00:00
Ilya Biryukov
60e5e0b667 Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
Reason: this caused severe compile time regressions in JAX.
See email thread  of original revision on llvm-commits for details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html

llvm-svn: 372756
2019-09-24 13:48:02 +00:00
Simon Pilgrim
af6043557d [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation.

The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants).

Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.).

I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches.

Differential Revision: https://reviews.llvm.org/D67557

llvm-svn: 372333
2019-09-19 15:02:47 +00:00
Amaury Sechet
9e94ef42ba [DAGCombiner] Add node to the worklist in topological order in scalarizeExtractedVectorLoad
Summary: As per title.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66661

llvm-svn: 372327
2019-09-19 14:22:11 +00:00
Simon Pilgrim
c65dd89804 [DAG] Add SelectionDAG::MaxRecursionDepth constant
As commented on D67557 we have a lot of uses of depth checks all using magic numbers.

This patch adds the SelectionDAG::MaxRecursionDepth constant and moves over some general cases to use this explicitly.

Differential Revision: https://reviews.llvm.org/D67711

llvm-svn: 372315
2019-09-19 12:58:43 +00:00
Roman Lebedev
c00f318224 [DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold
Summary:
`DAGCombiner::visitADDLikeCommutative()` already has a sibling fold:
`(add X, Carry) -> (addcarry X, 0, Carry)`

This fold, as suggested by @efriedma, helps recover from //some//
of the regressions of D62266

Reviewers: efriedma, deadalnix

Subscribers: javed.absar, kristof.beyls, llvm-commits, efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62392

llvm-svn: 372259
2019-09-18 20:48:27 +00:00