374 Commits

Author SHA1 Message Date
Simon Pilgrim
fc8f1d7da7 [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector
llvm-svn: 346538
2018-11-09 19:04:27 +00:00
Dorit Nuzman
34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Simon Pilgrim
53e8e145e9 [CostModel][X86] Add realistic vXi64 uitofp vXf64 costs
Match codegen improvements from D53649/rL345256

llvm-svn: 345263
2018-10-25 13:06:20 +00:00
Simon Pilgrim
0573b8d8b6 [CostModel][X86] Add realistic i64 uitofp f64 scalar costs
llvm-svn: 345261
2018-10-25 12:42:10 +00:00
Simon Pilgrim
ac84005841 [CostModel][X86] Add vXi8 vector division by constants costs.
ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV.

llvm-svn: 345175
2018-10-24 18:44:12 +00:00
Simon Pilgrim
2cce074e8c [CostModel][X86] Enable non-uniform vector division by constants costs.
Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases.

llvm-svn: 345164
2018-10-24 17:30:29 +00:00
Simon Pilgrim
f04a04c2b6 [TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no difference in lowering.
llvm-svn: 345048
2018-10-23 16:45:26 +00:00
Dorit Nuzman
38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman
5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman
8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Matthias Braun
d6131c9633 X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_Free
DIV/REM by constants should always be expanded into mul/shift/etc.
patterns. Unfortunately the ConstantHoisting pass runs too early at a
point where the pattern isn't expanded yet. However after
ConstantHoisting hoisted some immediate the result may not expand
anymore. Also the hoisting typically doesn't make sense because it
operates on immediates that will change completely during the expansion.

Report DIV/REM as TCC_Free so ConstantHoisting will not touch them.

Differential Revision: https://reviews.llvm.org/D53174

llvm-svn: 344315
2018-10-11 23:14:35 +00:00
Craig Topper
a72012c206 [X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F.
Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51267

llvm-svn: 340708
2018-08-26 18:47:44 +00:00
Craig Topper
dd0ef801f8 Recommit r338204 "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'."
This checks in a more direct way without triggering a UBSAN error.

llvm-svn: 338273
2018-07-30 17:29:57 +00:00
Dean Michael Berris
927b3da6c9 Revert "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'."
This reverts commit r338204.

llvm-svn: 338236
2018-07-30 09:45:09 +00:00
Craig Topper
5daa032546 [X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'.
X86 normally requires immediates to be a signed 32-bit value which would exclude i64 0x80000000. But for add/sub we can negate the constant and use the opposite instruction.

llvm-svn: 338204
2018-07-28 18:21:46 +00:00
Craig Topper
ba208b07b6 [X86] Use alignTo and divideCeil to make some code more readable. NFC
llvm-svn: 338203
2018-07-28 18:21:45 +00:00
Simon Pilgrim
dc113dc7ed [CostModel][X86] Add SREM/UREM general and constant costs (PR38056)
We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM.

This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975).

Differential Revision: https://reviews.llvm.org/D48980

llvm-svn: 336486
2018-07-07 16:53:30 +00:00
Simon Pilgrim
8c3765dc6b [CostModel][X86] Add UDIV/UREM by pow2 costs
Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc.

llvm-svn: 336371
2018-07-05 16:56:28 +00:00
Simon Pilgrim
2a9cde026c [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. 

llvm-svn: 335216
2018-06-21 11:37:13 +00:00
Simon Pilgrim
e39fa6cbbb [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:

e.g. v4f32: <0,5,2,7> or <4,1,6,3>

This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:

e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.

This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.

Differential Revision: https://reviews.llvm.org/D47985

llvm-svn: 334513
2018-06-12 16:12:29 +00:00
Simon Pilgrim
4162d77744 [TTI] Add uniform/non-uniform constant Pow2 detection to TargetTransformInfo::getInstructionThroughput
This enables us to detect more fast path sdiv cases under cost analysis.

This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs.

Found while working on D46276

Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases.

Differential Revision: https://reviews.llvm.org/D46637

llvm-svn: 332969
2018-05-22 10:40:09 +00:00
Adrian Prantl
5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Simon Pilgrim
2faf606fb6 [CostModel][X86] Remove hard coded SDIV/UDIV vector costs
Algorithmically compute the 'x20' SDIV/UDIV vector costs - this is necessary for PR36550 when DIV costs will be driven from the scheduler models.

llvm-svn: 330870
2018-04-25 20:59:16 +00:00
Simon Pilgrim
58e03a09db [CostModel][X86] Recursive call for cost of imul for packed v16i16 constant shift left.
Don't just assume cost = 1.

llvm-svn: 330834
2018-04-25 15:22:03 +00:00
Simon Pilgrim
80ce1dde44 [CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets
llvm-svn: 329498
2018-04-07 13:24:33 +00:00
Craig Topper
a985919d3e [X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont
Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44644

llvm-svn: 328451
2018-03-25 15:58:12 +00:00
Simon Pilgrim
9929f90740 [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.

Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.

Differential Revision: https://reviews.llvm.org/D43733

llvm-svn: 326133
2018-02-26 22:10:17 +00:00
Simon Pilgrim
cb9a02f60e [X86][SSE] Increase PMULLD costs to better match hardware
Until Skylake, most hardware could only issue a PMULLD op every other cycle

llvm-svn: 324823
2018-02-10 19:27:10 +00:00
Sanjay Patel
d7c702b451 [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681)
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base 
address offsets.

SPEC2017 on Ryzen shows no significant perf difference.

Differential Revision: https://reviews.llvm.org/D42607

llvm-svn: 324289
2018-02-05 23:43:05 +00:00
Simon Pilgrim
eb07016156 Spelling mistake in comment. NFCI.
llvm-svn: 323752
2018-01-30 12:18:51 +00:00
Craig Topper
0d797a34d8 [X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface.
This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for.

I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10.

This has been split from D41895 with the remainder in a follow up commit.

llvm-svn: 323015
2018-01-20 00:26:08 +00:00
Alexey Bataev
771ec9f399 [COST]Fix PR35865: Fix cost model evaluation for shuffle on X86.
Summary:
If the vector type is transformed to non-vector single type, the compile
may crash trying to get vector information about non-vector type.

Reviewers: RKSimon, spatel, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41862

llvm-svn: 322106
2018-01-09 19:08:22 +00:00
Craig Topper
8b0f185c31 [X86] Simplify the TTI code for getInterleavedMemoryOpCost around for AVX512BW. NFCI
Previously the lambda for AVX512 passed out a flag that indicated whether AVX512BW was required and that was checked against the AVX512BW subtarget flag outside.

This patch changes the interface to pass the AVX512BW subtarget bit in and return its value if we detect 16 or 8 bit types.

llvm-svn: 319919
2017-12-06 18:40:46 +00:00
Sanjay Patel
0de1a4bc2d [PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result
This should fix PR31455:
https://bugs.llvm.org/show_bug.cgi?id=31455

Differential Revision: https://reviews.llvm.org/D28314

llvm-svn: 319094
2017-11-27 21:15:43 +00:00
Craig Topper
ea37e201ec [X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is disabled. Allow gather on SKX/CNL/ICL when AVX512 is disabled by using AVX2 instructions.
Summary:
This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior.

Test command lines have been added for these two cases.

Reviewers: magabari, delena, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40282

llvm-svn: 318983
2017-11-25 18:09:37 +00:00
Craig Topper
d5b5bbe22f [X86] Spell penryn correctly in some comments. NFC
llvm-svn: 318855
2017-11-22 18:23:40 +00:00
Mohammed Agabaria
115f68ea3e [LV][X86] Support of AVX2 Gathers code generation and update the LV with this
This patch depends on: https://reviews.llvm.org/D35348

Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen)
Update LoopVectorize to generate gathers for AVX2 processors.

Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb

Reviewed By: delena, RKSimon

Differential Revision: https://reviews.llvm.org/D35772

llvm-svn: 318641
2017-11-20 08:18:12 +00:00
David Blaikie
b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Mohammed Agabaria
6e6d5326a1 [TTI][X86] update costs of interleaved load\store of i64\double
This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2.

Reviewers: delena, RKSimon, craig.topper, dorit

Reviewed By: dorit

Differential Revision: https://reviews.llvm.org/D40008

llvm-svn: 318385
2017-11-16 09:38:32 +00:00
Craig Topper
46a5d58b8c [X86] Update TTI to report that v1iX/v1fX types aren't legal for masked gather/scatter/load/store.
The type legalizer will try to scalarize these operations if it sees them, but there is no handling for scalarizing them. This leads to a fatal error. With this change they will now be scalarized by the mem intrinsic scalarizing pass before SelectionDAG.

llvm-svn: 318380
2017-11-16 06:02:05 +00:00
Alexey Bataev
e25a6fd390 [SLP] Fix PR35047: Fix default cost model for cast op in X86.
Summary:
The cost calculation for default case on X86 target does not always
follow correct wayt because of missing 4-th argument in
`BaseT::getCastInstrCost()` call. Added this missing parameter.

Reviewers: hfinkel, mkuper, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39687

llvm-svn: 317576
2017-11-07 14:23:44 +00:00
Mohammed Agabaria
6691758364 [LV][X86] update the cost of interleaving mem. access of floats
Recommit:
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
fixed the location of the lit test it works with make check-all.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317471
2017-11-06 10:56:20 +00:00
Mohammed Agabaria
acd69dbc7c [REVERT][LV][X86] update the cost of interleaving mem. access of floats
reverted my changes will be committed later after fixing the failure
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317433
2017-11-05 09:36:54 +00:00
Mohammed Agabaria
f74c767de6 [LV][X86] update the cost of interleaving mem. access of floats
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317432
2017-11-05 09:06:23 +00:00
Clement Courbet
b2c3eb8cf1 [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
- Targets that want to support memcmp expansions now return the list of
   supported load sizes.
 - Expansion codegen does not assume that all power-of-two load sizes
   smaller than the max load size are valid. For examples, this is not the
   case for x86(32bit)+sse2.

Fixes PR34887.

llvm-svn: 316905
2017-10-30 14:19:33 +00:00
Michael Zuckerman
49293264cc [AVX512][AVX2]Cost calculation for interleave load/store patterns {v8i8,v16i8,v32i8,v64i8}
This patch adds accurate instructions cost.
The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride.

Reviewers:
1. delena
2. Farhana
3. zvi
4. dorit
5. Ayal

Differential Revision: https://reviews.llvm.org/D38762

Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7
llvm-svn: 316072
2017-10-18 11:41:55 +00:00
Clement Courbet
2807c0a442 [CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> TargetTransformInfo::enableMemCmpExpansion.
Summary:
Right now there are two functions with the same name, one does the work
and the other one returns true if expansion is needed. Rename
TargetTransformInfo::expandMemCmp to make it more consistent with other
members of TargetTransformInfo.

Remove the unused Instruction* parameter.

Differential Revision: https://reviews.llvm.org/D38165

llvm-svn: 314096
2017-09-25 06:35:16 +00:00
Sanjay Patel
6fd4391ddd [DivRempairs] add a pass to optimize div/rem pairs (PR31028)
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented 
as an independent pass, so there's no stretching of scope and feature creep for an existing pass. 
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost 
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028

The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow 
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.

Decomposing remainder may allow removing some code from the backend (PPC and possibly others).

Differential Revision: https://reviews.llvm.org/D37121 

llvm-svn: 312862
2017-09-09 13:38:18 +00:00
Alexey Bataev
6dd29fccb8 [SLP] Support for horizontal min/max reduction.
SLP vectorizer supports horizontal reductions for Add/FAdd binary
operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for
binary operation reductions and getMinMaxReductionCost() for min/max
reductions.
Patch fixes PR26956.

Differential revision: https://reviews.llvm.org/D27846

llvm-svn: 312791
2017-09-08 13:49:36 +00:00
Zvi Rackover
25799d93f0 X86: Improve AVX512 fptoui lowering
Summary:
Add patterns for
  fptoui <16 x float> to <16 x i8>
  fptoui <16 x float> to <16 x i16>

Reviewers: igorb, delena, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37505

llvm-svn: 312704
2017-09-07 07:40:34 +00:00