526 Commits

Author SHA1 Message Date
Martin Storsjo
86b95489fd Revert "[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)"
This reverts commit r336812, which broke compilation of a number
of projects, see PR38154.

llvm-svn: 336949
2018-07-12 21:33:42 +00:00
Simon Pilgrim
876e99bf2c [SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Reapplied with fix to only accept 2 different casts if they come from the same source type.

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336812
2018-07-11 15:05:10 +00:00
Simon Pilgrim
1edde95abd Revert rL336804: [SLPVectorizer] Add initial alternate opcode support for cast instructions.
Reverting due to buildbot failures

llvm-svn: 336806
2018-07-11 14:08:16 +00:00
Simon Pilgrim
2f963a7e83 [SLPVectorizer] Add initial alternate opcode support for cast instructions.
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336804
2018-07-11 13:34:09 +00:00
Simon Pilgrim
dafd828c97 [SLPVectorizer] Begin abstracting InstructionsState alternate matching away from opcodes. NFCI.
This is an early step towards matching Instructions by attributes other than the opcode. This will be necessary for cast/call alternates which share the same opcode but have different types/intrinsicIDs etc. - which we could vectorize as long as we split them using the alternate mechanism.

Differential Revision: https://reviews.llvm.org/D48945

llvm-svn: 336344
2018-07-05 12:30:44 +00:00
Simon Pilgrim
ae1c4dcc6e Fix some irregular whitespace/indentation. NFCI.
llvm-svn: 336291
2018-07-04 17:24:05 +00:00
Farhana Aleen
3b416db19b [SLP] Recognize min/max pattern using instructions producing same values.
Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.

         %1 = extractelement <2 x i32> %a, i32 0
         %2 = extractelement <2 x i32> %a, i32 1
         %cond = icmp sgt i32 %1, %2
         %3 = extractelement <2 x i32> %a, i32 0
         %4 = extractelement <2 x i32> %a, i32 1
         %select = select i1 %cond, i32 %3, i32 %4

Author: FarhanaAleen

Reviewed By: ABataev, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D47608

llvm-svn: 336130
2018-07-02 17:55:31 +00:00
Simon Pilgrim
d5fb50e3bf [SLPVectorizer] Remove nullptr early-outs from Instruction::ShuffleVector getEntryCost
This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure.

llvm-svn: 336102
2018-07-02 13:41:29 +00:00
Simon Pilgrim
265793d52a [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns.
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.

This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...

llvm-svn: 336095
2018-07-02 11:28:01 +00:00
Simon Pilgrim
409bd5f487 [SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for getEntryCost/vectorizeTree. NFCI.
Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators.

llvm-svn: 336092
2018-07-02 10:54:19 +00:00
Simon Pilgrim
3dafb553d9 [SLPVectorizer] Call InstructionsState.isOpcodeOrAlt with Instruction instead of an opcode. NFCI.
llvm-svn: 336069
2018-07-01 20:22:46 +00:00
Simon Pilgrim
ef9c97c343 [SLPVectorizer] Replace sameOpcodeOrAlt with InstructionsState.isOpcodeOrAlt helper. NFCI.
This is a basic step towards matching more general instructions types than just opcodes.

llvm-svn: 336068
2018-07-01 20:07:30 +00:00
Simon Pilgrim
77d2067677 [SLPVectorizer] Use InstructionsState Op/Alt opcodes directly. NFCI.
llvm-svn: 336063
2018-07-01 13:41:58 +00:00
Simon Pilgrim
bbfc18b5b5 [SLPVectorizer] Recognise non uniform power of 2 constants
Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them.

As SLP works with arrays of values I don't think we can easily use the pattern match helpers here.

Differential Revision: https://reviews.llvm.org/D48214

llvm-svn: 335621
2018-06-26 16:20:16 +00:00
Simon Pilgrim
9d3ef8ee2b [SLPVectorizer] Support alternate opcodes in tryToVectorizeList
Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree.

NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc.

Differential Revision: https://reviews.llvm.org/D48488

llvm-svn: 335364
2018-06-22 16:37:34 +00:00
Simon Pilgrim
213cb1b82d [SLPVectorizer] reorderAltShuffleOperands should just take InstructionsState. NFCI.
All calls were extracting the InstructionsState Opcode/AltOpcode values so we might as well pass it directly

llvm-svn: 335359
2018-06-22 16:10:26 +00:00
Simon Pilgrim
1e564504bb [SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair
SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.

This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.

Differential Revision: https://reviews.llvm.org/D48477

llvm-svn: 335349
2018-06-22 14:04:06 +00:00
Simon Pilgrim
3d1c8c97b8 [SLPVectorizer] Provide InstructionsState down the BoUpSLP vectorization call tree
As described in D48359, this patch pushes InstructionsState down the BoUpSLP call hierarchy instead of the corresponding raw OpValue. This makes it easier to track the alternate opcode etc. and avoids us having to call getAltOpcode which makes it difficult to support more than one alternate opcode.

Differential Revision: https://reviews.llvm.org/D48382

llvm-svn: 335170
2018-06-20 20:54:52 +00:00
Simon Pilgrim
292651a5b7 [SLPVectorizer] Move isOneOf after InstructionsState type. NFCI.
A future patch will have isOneOf use InstructionsState.

llvm-svn: 335142
2018-06-20 16:11:00 +00:00
Simon Pilgrim
0c9f8dcde7 [SLPVectorizer] Use InstructionsState to record AltOpcode
This is part of a move towards generalizing the alternate opcode mechanism and not just supporting (F)Add/(F)Sub counterparts.

The patch embeds the AltOpcode in the InstructionsState instead of calling getAltOpcode so often.

I'm hoping to eventually remove all uses of getAltOpcode and handle alternate opcode selection entirely within getSameOpcode, that will require us to use InstructionsState throughout the BoUpSLP call hierarchy (similar to some of the changes in D28907), which I will begin in future patches.

Differential Revision: https://reviews.llvm.org/D48359

llvm-svn: 335134
2018-06-20 15:13:40 +00:00
Simon Pilgrim
2e2f20a949 [SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern
D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.

This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.

The AArch64 regression will be fixed by D48172.

Differential Revision: https://reviews.llvm.org/D48174

llvm-svn: 335130
2018-06-20 14:26:28 +00:00
Simon Pilgrim
b7ac037797 [SLPVectorizer] Split Tree/Reduction cost calls to simplify debugging. NFCI.
llvm-svn: 335110
2018-06-20 09:39:01 +00:00
Simon Pilgrim
0461393660 [SLPVectorizer] Remove default OperandValueKind arguments from getArithmeticInstrCost calls (NFC)
The getArithmeticInstrCost calls for shuffle vectors entry costs specify TargetTransformInfo::OperandValueKind arguments, but are just using the method's default values. This seems to be a copy + paste issue and doesn't affect the costs in anyway. The TargetTransformInfo::OperandValueProperties default arguments are already not being used.

Noticed while working on D47985.

Differential Revision: https://reviews.llvm.org/D48008

llvm-svn: 335045
2018-06-19 13:40:00 +00:00
Simon Pilgrim
c966f7213e [SLPVectorizer] Pull out AltOpcode determination from reorderAltShuffleOperands.
Minor step towards making the alternate opcode system work with a wider range of opcode pairs.

llvm-svn: 335032
2018-06-19 09:16:06 +00:00
Simon Pilgrim
5b962b2fc3 [SLPVectorizer] Tidyup isShuffle helper
Ensure we keep track of the input vectors in all cases instead of just for SK_Select.

Ideally we'd reuse the shuffle mask pattern matching in TargetTransformInfo::getInstructionThroughput here to easily add support for all TargetTransformInfo::ShuffleKind without mass code duplication, I've added a TODO for now but D48236 should help us here.

Differential Revision: https://reviews.llvm.org/D48023

llvm-svn: 334958
2018-06-18 16:25:01 +00:00
Simon Pilgrim
99a5832016 [SLPVectorizer] Avoid calling const VL.size() repeatedly in for-loop. NFCI.
llvm-svn: 334934
2018-06-18 11:35:36 +00:00
Simon Pilgrim
b234ff136e [SLPVectorizer] Remove RawInstructionsData/getMainOpcode and merge into getSameOpcode
This is part of the work to cleanup use of 'alternate' ops so we can use the more general SK_Select shuffle type.

Only getSameOpcode calls getMainOpcode and much of the logic is repeated in both functions. This will require some reworking of D28907 but that patch has hit trouble and is unlikely to be completed anytime soon.

Differential Revision: https://reviews.llvm.org/D48120

llvm-svn: 334701
2018-06-14 10:25:19 +00:00
Simon Pilgrim
2c9d2adff5 [SLPVectorizer] getSameOpcode - remove useless cast [NFC]
There's no need to cast the base Value to an Instruction

llvm-svn: 334588
2018-06-13 10:49:24 +00:00
Simon Pilgrim
1224260f83 [SLPVectorizer] getSameOpcode - remove unusued alternate code [NFC]
We early-out for the case where we don't use alternate opcodes, so no need to check for it later.

llvm-svn: 334587
2018-06-13 10:14:27 +00:00
Simon Pilgrim
e39fa6cbbb [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:

e.g. v4f32: <0,5,2,7> or <4,1,6,3>

This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:

e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.

This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.

Differential Revision: https://reviews.llvm.org/D47985

llvm-svn: 334513
2018-06-12 16:12:29 +00:00
Florian Hahn
a1cc848399 Use SmallPtrSet explicitly for SmallSets with pointer types (NFC).
Currently SmallSet<PointerTy> inherits from SmallPtrSet<PointerTy>. This
patch replaces such types with SmallPtrSet, because IMO it is slightly
clearer and allows us to get rid of unnecessarily including SmallSet.h

Reviewers: dblaikie, craig.topper

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D47836

llvm-svn: 334492
2018-06-12 11:16:56 +00:00
Nicola Zaghen
d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Craig Topper
781aa181ab Fix a bunch of places where operator-> was used directly on the return from dyn_cast.
Inspired by r331508, I did a grep and found these.

Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa.

llvm-svn: 331577
2018-05-05 01:57:00 +00:00
Adrian Prantl
5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Davide Italiano
bd3bf1660b [SLPVectorizer] Debug info shouldn't impact spill cost computation.
<rdar://problem/39794738>

(Also, PR32761).

Differential Revision:  https://reviews.llvm.org/D46199

llvm-svn: 331199
2018-04-30 16:57:33 +00:00
Haicheng Wu
f7466f3164 [SLP] Use getExtractWithExtendCost() to compute the scalar cost of extractelement/ext pair
We use getExtractWithExtendCost to calculate the cost of extractelement and
s|zext together when computing the extract cost after vectorization, but we
calculate the cost of extractelement and s|zext separately when computing the
scalar cost which is larger than it should be.

Differential Revision: https://reviews.llvm.org/D45469

llvm-svn: 330143
2018-04-16 18:09:49 +00:00
Alexey Bataev
d5b1f7892f [SLP] Fixed formatting, NFC.
llvm-svn: 329091
2018-04-03 17:48:14 +00:00
Alexey Bataev
428e9d9d87 [SLP] Fix PR36481: vectorize reassociated instructions.
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.

Patch does not support reordering of the repeated instruction, this must
be handled in the separate patch.

Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43776

llvm-svn: 329085
2018-04-03 17:14:47 +00:00
Alexey Bataev
df989c54cf Recommit "[SLP] Fix issues with debug output in the SLP vectorizer."
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.

I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
errs() rather than dbgs().

llvm-svn: 329082
2018-04-03 16:40:33 +00:00
Benjamin Kramer
2fc3b18922 Revert "[SLP] Fix PR36481: vectorize reassociated instructions."
This reverts commit r328980 and r329046. Makes the vectorizer crash.

llvm-svn: 329071
2018-04-03 14:40:33 +00:00
Chandler Carruth
597bfd8448 [SLP] Fix issues with debug output in the SLP vectorizer.
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.

I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
`errs()` rather than `dbgs()`.

llvm-svn: 329046
2018-04-03 05:27:28 +00:00
Haicheng Wu
7f0daaeb86 [SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth
We use two approaches for determining the minimum bitwidth.

   * Demanded bits
   * Value tracking

If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.

But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in

%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i

are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in

%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0

it doesn't make sense to shrink %tmp15 and we can skip the value tracking.

Ideas are from Matthew Simpson!

Differential Revision: https://reviews.llvm.org/D44868

llvm-svn: 329035
2018-04-03 00:05:10 +00:00
Alexey Bataev
3decaf4275 [SLP] Fix PR36481: vectorize reassociated instructions.
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.

Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43776

llvm-svn: 328980
2018-04-02 14:51:37 +00:00
Matthew Simpson
6c289a1c74 [SLP] Stop counting cost of gather sequences with multiple uses
When building the SLP tree, we look for reuse among the vectorized tree
entries. However, each gather sequence is represented by a unique tree entry,
even though the sequence may be identical to another one. This means, for
example, that a gather sequence with two uses will be counted twice when
computing the cost of the tree. We should only count the cost of the definition
of a gather sequence rather than its uses. During code generation, the
redundant gather sequences are emitted, but we optimize them away with CSE. So
it looks like this problem just affects the cost model.

Differential Revision: https://reviews.llvm.org/D44742

llvm-svn: 328316
2018-03-23 14:18:27 +00:00
Haicheng Wu
aee0af3e23 [SLP] clean some formats
llvm-svn: 327433
2018-03-13 18:44:19 +00:00
Alexey Bataev
7f246e003a [SLP] Allow vectorization of reversed loads.
Summary:
Reversed loads are handled as gathering. But we can just reshuffle
these values. Patch adds support for vectorization of reversed loads.

Reviewers: RKSimon, spatel, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43022

llvm-svn: 325134
2018-02-14 15:29:15 +00:00
Alexey Bataev
ca2396e673 [SLP] Take user instructions cost into consideration in insertelement vectorization.
Summary:
For better vectorization result we should take into consideration the
cost of the user insertelement instructions when we try to
vectorize sequences that build the whole vector. I.e. if we have the
following scalar code:
```
<Scalar code>
insertelement <ScalarCode>, ...
```
we should consider the cost of the last `insertelement ` instructions as
the cost of the scalar code.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42657

llvm-svn: 324893
2018-02-12 14:54:48 +00:00
Clement Courbet
9c22d8018c [SLPVectorizer][NFC] Make a loop more readable.
llvm-svn: 324482
2018-02-07 14:26:43 +00:00
Alexey Bataev
9c5c103283 [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323662
2018-01-29 16:08:52 +00:00
Alexey Bataev
f86be12182 Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle."
This reverts commit r323530 to fix possible problems in users code.

llvm-svn: 323581
2018-01-27 02:42:21 +00:00