llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-05 10:16:07 +00:00

Author	SHA1	Message	Date
Simon Pilgrim	fc8f1d7da7	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector llvm-svn: 346538	2018-11-09 19:04:27 +00:00
Dorit Nuzman	34da6dd696	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705	2018-10-31 09:57:56 +00:00
Simon Pilgrim	53e8e145e9	[CostModel][X86] Add realistic vXi64 uitofp vXf64 costs Match codegen improvements from D53649/rL345256 llvm-svn: 345263	2018-10-25 13:06:20 +00:00
Simon Pilgrim	0573b8d8b6	[CostModel][X86] Add realistic i64 uitofp f64 scalar costs llvm-svn: 345261	2018-10-25 12:42:10 +00:00
Simon Pilgrim	ac84005841	[CostModel][X86] Add vXi8 vector division by constants costs. ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV. llvm-svn: 345175	2018-10-24 18:44:12 +00:00
Simon Pilgrim	2cce074e8c	[CostModel][X86] Enable non-uniform vector division by constants costs. Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases. llvm-svn: 345164	2018-10-24 17:30:29 +00:00
Simon Pilgrim	f04a04c2b6	[TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no difference in lowering. llvm-svn: 345048	2018-10-23 16:45:26 +00:00
Dorit Nuzman	38bbf81ade	recommit 344472 after fixing build failure on ARM and PPC. llvm-svn: 344475	2018-10-14 08:50:06 +00:00
Dorit Nuzman	5118c68cde	revert 344472 due to failures. llvm-svn: 344473	2018-10-14 07:21:20 +00:00
Dorit Nuzman	8174368955	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472	2018-10-14 07:06:16 +00:00
Matthias Braun	d6131c9633	X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_Free DIV/REM by constants should always be expanded into mul/shift/etc. patterns. Unfortunately the ConstantHoisting pass runs too early at a point where the pattern isn't expanded yet. However after ConstantHoisting hoisted some immediate the result may not expand anymore. Also the hoisting typically doesn't make sense because it operates on immediates that will change completely during the expansion. Report DIV/REM as TCC_Free so ConstantHoisting will not touch them. Differential Revision: https://reviews.llvm.org/D53174 llvm-svn: 344315	2018-10-11 23:14:35 +00:00
Craig Topper	a72012c206	[X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F. Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51267 llvm-svn: 340708	2018-08-26 18:47:44 +00:00
Craig Topper	dd0ef801f8	Recommit r338204 "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'." This checks in a more direct way without triggering a UBSAN error. llvm-svn: 338273	2018-07-30 17:29:57 +00:00
Dean Michael Berris	927b3da6c9	Revert "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'." This reverts commit r338204. llvm-svn: 338236	2018-07-30 09:45:09 +00:00
Craig Topper	5daa032546	[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'. X86 normally requires immediates to be a signed 32-bit value which would exclude i64 0x80000000. But for add/sub we can negate the constant and use the opposite instruction. llvm-svn: 338204	2018-07-28 18:21:46 +00:00
Craig Topper	ba208b07b6	[X86] Use alignTo and divideCeil to make some code more readable. NFC llvm-svn: 338203	2018-07-28 18:21:45 +00:00
Simon Pilgrim	dc113dc7ed	[CostModel][X86] Add SREM/UREM general and constant costs (PR38056) We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM. This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975). Differential Revision: https://reviews.llvm.org/D48980 llvm-svn: 336486	2018-07-07 16:53:30 +00:00
Simon Pilgrim	8c3765dc6b	[CostModel][X86] Add UDIV/UREM by pow2 costs Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc. llvm-svn: 336371	2018-07-05 16:56:28 +00:00
Simon Pilgrim	2a9cde026c	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216	2018-06-21 11:37:13 +00:00
Simon Pilgrim	e39fa6cbbb	[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513	2018-06-12 16:12:29 +00:00
Simon Pilgrim	4162d77744	[TTI] Add uniform/non-uniform constant Pow2 detection to TargetTransformInfo::getInstructionThroughput This enables us to detect more fast path sdiv cases under cost analysis. This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs. Found while working on D46276 Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases. Differential Revision: https://reviews.llvm.org/D46637 llvm-svn: 332969	2018-05-22 10:40:09 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Simon Pilgrim	2faf606fb6	[CostModel][X86] Remove hard coded SDIV/UDIV vector costs Algorithmically compute the 'x20' SDIV/UDIV vector costs - this is necessary for PR36550 when DIV costs will be driven from the scheduler models. llvm-svn: 330870	2018-04-25 20:59:16 +00:00
Simon Pilgrim	58e03a09db	[CostModel][X86] Recursive call for cost of imul for packed v16i16 constant shift left. Don't just assume cost = 1. llvm-svn: 330834	2018-04-25 15:22:03 +00:00
Simon Pilgrim	80ce1dde44	[CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets llvm-svn: 329498	2018-04-07 13:24:33 +00:00
Craig Topper	a985919d3e	[X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44644 llvm-svn: 328451	2018-03-25 15:58:12 +00:00
Simon Pilgrim	9929f90740	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280) Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133	2018-02-26 22:10:17 +00:00
Simon Pilgrim	cb9a02f60e	[X86][SSE] Increase PMULLD costs to better match hardware Until Skylake, most hardware could only issue a PMULLD op every other cycle llvm-svn: 324823	2018-02-10 19:27:10 +00:00
Sanjay Patel	d7c702b451	[LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289	2018-02-05 23:43:05 +00:00
Simon Pilgrim	eb07016156	Spelling mistake in comment. NFCI. llvm-svn: 323752	2018-01-30 12:18:51 +00:00
Craig Topper	0d797a34d8	[X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface. This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for. I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10. This has been split from D41895 with the remainder in a follow up commit. llvm-svn: 323015	2018-01-20 00:26:08 +00:00
Alexey Bataev	771ec9f399	[COST]Fix PR35865: Fix cost model evaluation for shuffle on X86. Summary: If the vector type is transformed to non-vector single type, the compile may crash trying to get vector information about non-vector type. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41862 llvm-svn: 322106	2018-01-09 19:08:22 +00:00
Craig Topper	8b0f185c31	[X86] Simplify the TTI code for getInterleavedMemoryOpCost around for AVX512BW. NFCI Previously the lambda for AVX512 passed out a flag that indicated whether AVX512BW was required and that was checked against the AVX512BW subtarget flag outside. This patch changes the interface to pass the AVX512BW subtarget bit in and return its value if we detect 16 or 8 bit types. llvm-svn: 319919	2017-12-06 18:40:46 +00:00
Sanjay Patel	0de1a4bc2d	[PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result This should fix PR31455: https://bugs.llvm.org/show_bug.cgi?id=31455 Differential Revision: https://reviews.llvm.org/D28314 llvm-svn: 319094	2017-11-27 21:15:43 +00:00
Craig Topper	ea37e201ec	[X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is disabled. Allow gather on SKX/CNL/ICL when AVX512 is disabled by using AVX2 instructions. Summary: This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior. Test command lines have been added for these two cases. Reviewers: magabari, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40282 llvm-svn: 318983	2017-11-25 18:09:37 +00:00
Craig Topper	d5b5bbe22f	[X86] Spell penryn correctly in some comments. NFC llvm-svn: 318855	2017-11-22 18:23:40 +00:00
Mohammed Agabaria	115f68ea3e	[LV][X86] Support of AVX2 Gathers code generation and update the LV with this This patch depends on: https://reviews.llvm.org/D35348 Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen) Update LoopVectorize to generate gathers for AVX2 processors. Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb Reviewed By: delena, RKSimon Differential Revision: https://reviews.llvm.org/D35772 llvm-svn: 318641	2017-11-20 08:18:12 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Mohammed Agabaria	6e6d5326a1	[TTI][X86] update costs of interleaved load\store of i64\double This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2. Reviewers: delena, RKSimon, craig.topper, dorit Reviewed By: dorit Differential Revision: https://reviews.llvm.org/D40008 llvm-svn: 318385	2017-11-16 09:38:32 +00:00
Craig Topper	46a5d58b8c	[X86] Update TTI to report that v1iX/v1fX types aren't legal for masked gather/scatter/load/store. The type legalizer will try to scalarize these operations if it sees them, but there is no handling for scalarizing them. This leads to a fatal error. With this change they will now be scalarized by the mem intrinsic scalarizing pass before SelectionDAG. llvm-svn: 318380	2017-11-16 06:02:05 +00:00
Alexey Bataev	e25a6fd390	[SLP] Fix PR35047: Fix default cost model for cast op in X86. Summary: The cost calculation for default case on X86 target does not always follow correct wayt because of missing 4-th argument in `BaseT::getCastInstrCost()` call. Added this missing parameter. Reviewers: hfinkel, mkuper, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39687 llvm-svn: 317576	2017-11-07 14:23:44 +00:00
Mohammed Agabaria	6691758364	[LV][X86] update the cost of interleaving mem. access of floats Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471	2017-11-06 10:56:20 +00:00
Mohammed Agabaria	acd69dbc7c	[REVERT][LV][X86] update the cost of interleaving mem. access of floats reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433	2017-11-05 09:36:54 +00:00
Mohammed Agabaria	f74c767de6	[LV][X86] update the cost of interleaving mem. access of floats This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317432	2017-11-05 09:06:23 +00:00
Clement Courbet	b2c3eb8cf1	[CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2). - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905	2017-10-30 14:19:33 +00:00
Michael Zuckerman	49293264cc	[AVX512][AVX2]Cost calculation for interleave load/store patterns {v8i8,v16i8,v32i8,v64i8} This patch adds accurate instructions cost. The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride. Reviewers: 1. delena 2. Farhana 3. zvi 4. dorit 5. Ayal Differential Revision: https://reviews.llvm.org/D38762 Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7 llvm-svn: 316072	2017-10-18 11:41:55 +00:00
Clement Courbet	2807c0a442	[CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> TargetTransformInfo::enableMemCmpExpansion. Summary: Right now there are two functions with the same name, one does the work and the other one returns true if expansion is needed. Rename TargetTransformInfo::expandMemCmp to make it more consistent with other members of TargetTransformInfo. Remove the unused Instruction* parameter. Differential Revision: https://reviews.llvm.org/D38165 llvm-svn: 314096	2017-09-25 06:35:16 +00:00
Sanjay Patel	6fd4391ddd	[DivRempairs] add a pass to optimize div/rem pairs (PR31028) This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented as an independent pass, so there's no stretching of scope and feature creep for an existing pass. I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost this same functionality as an addition to CGP in the motivating example of PR31028: https://bugs.llvm.org/show_bug.cgi?id=31028 The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and undo the hoisting that is done here. Decomposing remainder may allow removing some code from the backend (PPC and possibly others). Differential Revision: https://reviews.llvm.org/D37121 llvm-svn: 312862	2017-09-09 13:38:18 +00:00
Alexey Bataev	6dd29fccb8	[SLP] Support for horizontal min/max reduction. SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions. Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions. Patch fixes PR26956. Differential revision: https://reviews.llvm.org/D27846 llvm-svn: 312791	2017-09-08 13:49:36 +00:00
Zvi Rackover	25799d93f0	X86: Improve AVX512 fptoui lowering Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704	2017-09-07 07:40:34 +00:00

1 2 3 4 5 ...

374 Commits