Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.
Reviewers: spatel, gchatelet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55263
llvm-svn: 349731
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.
Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.
Differential Revision: https://reviews.llvm.org/D54698
llvm-svn: 348353
Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal.
This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature.
Differential Revision: https://reviews.llvm.org/D54984
llvm-svn: 347786
This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst.
I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types.
Differential Revision: https://reviews.llvm.org/D54979
llvm-svn: 347785
We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them.
This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see.
Differential Revision: https://reviews.llvm.org/D54711
llvm-svn: 347248
Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost.
This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs.
llvm-svn: 346854
When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same.
llvm-svn: 346688
optsize using masked wide loads
Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53668
llvm-svn: 345705
Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases.
llvm-svn: 345164
interleave-group
The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.
Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53011
llvm-svn: 344472
DIV/REM by constants should always be expanded into mul/shift/etc.
patterns. Unfortunately the ConstantHoisting pass runs too early at a
point where the pattern isn't expanded yet. However after
ConstantHoisting hoisted some immediate the result may not expand
anymore. Also the hoisting typically doesn't make sense because it
operates on immediates that will change completely during the expansion.
Report DIV/REM as TCC_Free so ConstantHoisting will not touch them.
Differential Revision: https://reviews.llvm.org/D53174
llvm-svn: 344315
Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51267
llvm-svn: 340708
X86 normally requires immediates to be a signed 32-bit value which would exclude i64 0x80000000. But for add/sub we can negate the constant and use the opposite instruction.
llvm-svn: 338204
We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM.
This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975).
Differential Revision: https://reviews.llvm.org/D48980
llvm-svn: 336486
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does.
llvm-svn: 335216
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:
e.g. v4f32: <0,5,2,7> or <4,1,6,3>
This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:
e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.
This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.
Differential Revision: https://reviews.llvm.org/D47985
llvm-svn: 334513
This enables us to detect more fast path sdiv cases under cost analysis.
This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs.
Found while working on D46276
Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases.
Differential Revision: https://reviews.llvm.org/D46637
llvm-svn: 332969
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
Algorithmically compute the 'x20' SDIV/UDIV vector costs - this is necessary for PR36550 when DIV costs will be driven from the scheduler models.
llvm-svn: 330870
Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44644
llvm-svn: 328451
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.
Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.
Differential Revision: https://reviews.llvm.org/D43733
llvm-svn: 326133
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for.
I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10.
This has been split from D41895 with the remainder in a follow up commit.
llvm-svn: 323015
Summary:
If the vector type is transformed to non-vector single type, the compile
may crash trying to get vector information about non-vector type.
Reviewers: RKSimon, spatel, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41862
llvm-svn: 322106
Previously the lambda for AVX512 passed out a flag that indicated whether AVX512BW was required and that was checked against the AVX512BW subtarget flag outside.
This patch changes the interface to pass the AVX512BW subtarget bit in and return its value if we detect 16 or 8 bit types.
llvm-svn: 319919
Summary:
This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior.
Test command lines have been added for these two cases.
Reviewers: magabari, delena, RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40282
llvm-svn: 318983