llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-02 07:36:07 +00:00

Author	SHA1	Message	Date
Christopher Tetreault	0da1e7ebf9	[SVE] Remove calls to VectorType::getNumElements from X86 Reviewers: efriedma, RKSimon, craig.topper, fpetrogalli, c-rhodes Reviewed By: RKSimon Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82508	2020-06-29 11:10:35 -07:00
Guillaume Chatelet	b66e33a689	[Alignment][NFC] Migrate TTI::getGatherScatterOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82577	2020-06-26 11:08:27 +00:00
Guillaume Chatelet	fdc7c7fb87	[Alignment][NFC] Migrate TTI::getInterleavedMemoryOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82573	2020-06-26 11:00:53 +00:00
Guillaume Chatelet	7e1f79c3de	[Alignment][NFC] Migrate TTI::getMaskedMemoryOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82569	2020-06-26 10:14:16 +00:00
Guillaume Chatelet	324cda2073	[Alignment][NFC] Conform X86, ARM and AArch64 TargetTransformInfo backends to the public API The main interface has been migrated to Align already but a few backends where broadening the type from Align to MaybeAlign. This patch makes sure all implementations conform to the public API. Differential Revision: https://reviews.llvm.org/D82465	2020-06-25 13:23:13 +00:00
dfukalov	7ddee0922f	[NFCI][CostModel] Add const to Value*. Summary: Get back `const` partially lost in one of recent changes. Additionally specify explicit qualifiers in few places. Reviewers: samparker Reviewed By: samparker Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82383	2020-06-24 23:16:08 +03:00
Sam Parker	2596da3174	[CostModel] getCFInstrCost in getUserCost. Have BasicTTI call the base implementation so that both agree on the default behaviour, which the default being a cost of '1'. This has required an X86 specific implementation as it seems to be very reliant on those instructions being free. Changes are also made to AMDGPU so that their implementations distinguish between cost kinds, so that the unrolling isn't affected. PowerPC also has its own implementation to prevent changes to the reg-usage vectorizer test. The cost model test changes now reflect that ret instructions are not generally free. Differential Revision: https://reviews.llvm.org/D79164	2020-06-15 09:28:46 +01:00
Christopher Tetreault	9044027e45	[SVE] Eliminate calls to default-false VectorType::get() from X86 Reviewers: efriedma, craig.topper, RKSimon, samparker, kmclaughlin, david-arm Reviewed By: david-arm Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81520	2020-06-10 09:56:00 -07:00
Sam Parker	fa8bff0cd1	[CostModel] Unify getArithmeticInstrCost Add the remaining arithmetic opcodes into the generic implementation of getUserCost and then call this from getInstructionThroughput. Most of the backends have been modified to return the base implementation for cost kinds other RecipThroughput. The outlier here is AMDGPU which already uses getArithmeticInstrCost for all the cost kinds. This change means that most of the opcodes can be removed from that backends implementation of getUserCost. Differential Revision: https://reviews.llvm.org/D80992	2020-06-10 09:08:45 +01:00
Sam Parker	37289615c0	[NFCI][CostModel] Unify getCmpSelInstrCost Add cases for icmp, fcmp and select into the switch statement of the generic getUserCost implementation with getInstructionThroughput then calling into it. The BasicTTI and backend implementations have be set to return a default value (1) when a cost other than throughput is being queried. Differential Revision: https://reviews.llvm.org/D80550	2020-06-09 07:41:22 +01:00
Sam Parker	5b5e78ad2b	[CostModel] Follow-up to buildbot fix Adding type checks into the other backends that call getTypeLegalizationCost. Differential Revision: https://reviews.llvm.org/D80984	2020-06-08 15:26:25 +01:00
Sam Parker	9303546b42	[CostModel] Unify getMemoryOpCost Use getMemoryOpCost from the generic implementation of getUserCost and have getInstructionThroughput return the result of that for loads and stores. This also means that the X86 implementation of getUserCost can be removed with the functionality folded into its getMemoryOpCost. Differential Revision: https://reviews.llvm.org/D80984	2020-06-05 10:13:38 +01:00
Christopher Tetreault	5a99ec10f5	[SVE] Eliminate calls to default-false VectorType::get() from X86 Reviewers: efriedma, sdesmalen, c-rhodes, craig.topper Reviewed By: craig.topper Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80331	2020-05-29 16:16:07 -07:00
Sam Parker	8aaabadece	[CostModel] Unify getCastInstrCost Add the remaining cast instruction opcodes to the base implementation of getUserCost and directly return the result. This allows getInstructionThroughput to return getUserCost for the casts. This has required changes to PPC and SystemZ because they implement getUserCost and/or getCastInstrCost with adjustments for vector operations. Adjusts have also been made in the remaining backends that implement the method so that they still produce a cost of zero or one for cost kinds other than throughput. Differential Revision: https://reviews.llvm.org/D79848	2020-05-26 11:29:57 +01:00
Sam Parker	871556a494	[CostModel] Unify Intrinsic Costs. Recommitting most of the remaining changes from 259eb619ff6dcd5b6111d1686e18559b9ca004d4, but excluding the call to getUserCost from getInstructionThroughput. Though there's still no test changes, I doubt that this is an NFC... With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. The remaining scalar intrinsics were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. This had required a change in the AMDGPU backend for fabs as it should always be 'free'. I've also changed the X86 backend to return the BaseT implementation when the CostKind isn't RecipThroughput. Differential Revision: https://reviews.llvm.org/D80012	2020-05-26 09:48:26 +01:00
Sam Parker	259eb619ff	Revert "[CostModel] Unify Intrinsic Costs." This reverts commit de71def3f59dc9f12f67141b5040d8e15c84d08a. This is causing some very large changes, so I'm first going to break this patch down and re-commit in parts.	2020-05-21 12:50:24 +01:00
Sam Parker	de71def3f5	[CostModel] Unify Intrinsic Costs. With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. This involved sinking cost of the TTIImpl into the base implementation, as it performs no target checks. The opcodes remaining were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. getInstructionThroughput can now directly return the result of getUserCost. This had required a change in the AMDGPU backend for fabs and its always 'free'. I've also changed the X86 backend to return '1' for any intrinsic when the CostKind isn't RecipThroughput. Though this intended to be a non-functional change, there are many paths being combined here so I would be very surprised if this didn't have an effect. Differential Revision: https://reviews.llvm.org/D80012	2020-05-21 07:38:25 +01:00
Sam Parker	8cc911fa5b	[NFCI][CostModel] Refactor getIntrinsicInstrCost Combine the two API calls into one by introducing a structure to hold the relevant data. This has the added benefit of moving the boiler plate code for arguments and flags, into the constructors. This is intended to be a non-functional change, but the complicated web of logic involved here makes it very hard to guarantee. Differential Revision: https://reviews.llvm.org/D79941	2020-05-20 11:59:08 +01:00
Simon Pilgrim	4e3c005554	[TTI] getScalarizationOverhead - use explicit VectorType operand getScalarizationOverhead is only ever called with vectors (and we already had a load of cast<VectorType> calls immediately inside the functions). Followup to D78357 Reviewed By: @samparker Differential Revision: https://reviews.llvm.org/D79341	2020-05-05 16:59:23 +01:00
Sam Parker	40574fefe9	[NFC][CostModel] Add TargetCostKind to relevant APIs Make the kind of cost explicit throughout the cost model which, apart from making the cost clear, will allow the generic parts to calculate better costs. It will also allow some backends to approximate and correlate the different costs if they wish. Another benefit is that it will also help simplify the cost model around immediate and intrinsic costs, where we currently have multiple APIs. RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html Differential Revision: https://reviews.llvm.org/D79002	2020-05-05 10:35:54 +01:00
Craig Topper	b938168aef	[X86] Lower the cost of v4i64->v4i32 truncate with avx512. We use the vpmovqd instruction which is a single uop. So the cost should be 1.	2020-05-01 11:09:37 -07:00
Craig Topper	6a1ad76dab	[X86] Don't return true from isTruncateFree for vectors Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to. Differential Revision: https://reviews.llvm.org/D79045	2020-04-30 16:43:35 -07:00
Craig Topper	ff66919020	[X86][CostModel] Bump the cost of vpermw/vpermt2b/vperm2w vpermw is 2 uops. vpermt2b/vpermt2w are two shuffle uops and a port 015 uop. Weirdly vpermb is a single uop. This patch bumps the cost to 2 for these operations. Maybe should go to 3 for the vpermt2*, but I've started conservative. I've also removed a few entries that were now the same as earlier subtargets or that I didn't think we really did. Like I don't think we extend v32i8 to v32i16, shuffle, and then truncate. Differential Revision: https://reviews.llvm.org/D79148	2020-04-30 11:32:25 -07:00
Craig Topper	cff6686532	[X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these For v4i64->v4i32 we generate: vextractf128 xmm1, ymm0, 1 vshufps xmm0, xmm0, xmm1, 136 # xmm0 = xmm0[0,2],xmm1[0,2] And for v8i64->v8i32 we generate: vperm2f128 ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3] vinsertf128 ymm0, ymm0, xmm1, 1 vshufps ymm0, ymm0, ymm2, 136 # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6] Differential Revision: https://reviews.llvm.org/D79109	2020-04-29 13:21:44 -07:00
Simon Pilgrim	090cae8491	[TTI] Add DemandedElts to getScalarizationOverhead The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited. This patch does 2 things: 1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern. 2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs. This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing. A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well. Reviewed By: @craig.topper Differential Revision: https://reviews.llvm.org/D78216	2020-04-29 12:00:38 +01:00
Craig Topper	59b9e6fe76	[X86] Update costs for truncates from less than 128-bit vectors to vXi1 on pre-avx512 targets vXi1 types are legalized by promoting, but the narrow vectors are legalized by widening. This results in some truncates turning into any_extends.	2020-04-28 11:35:41 -07:00
Craig Topper	d42192c50f	[X86][CostModel] Correct the costs for truncate to a mask register with avx512 I've modified isTruncateFree to get an accurate cost for types that need to be split. I'm planning to look into fixing it for all vectors, but need more cost cleanups first. Differential Revision: https://reviews.llvm.org/D78973	2020-04-28 10:39:36 -07:00
Sam Parker	e9c9329aa4	[TTI] Add TargetCostKind argument to getUserCost There are several different types of cost that TTI tries to provide explicit information for: throughput, latency, code size along with a vague 'intersection of code-size cost and execution cost'. The vectorizer is a keen user of RecipThroughput and there's at least 'getInstructionThroughput' and 'getArithmeticInstrCost' designed to help with this cost. The latency cost has a single use and a single implementation. The intersection cost appears to cover most of the rest of the API. getUserCost is explicitly called from within TTI when the user has been explicit in wanting the code size (also only one use) as well as a few passes which are concerned with a mixture of size and/or a relative cost. In many cases these costs are closely related, such as when multiple instructions are required, but one evident diverging cost in this function is for div/rem. This patch adds an argument so that the cost required is explicit, so that we can make the important distinction when necessary. Differential Revision: https://reviews.llvm.org/D78635	2020-04-28 08:57:45 +01:00
Craig Topper	37ec709233	[X86][CostModel] Update truncate costs for some narrow vector cases to match their wider version. This updates v4i16->v4i8 with sse2 to match v8i16->v8i8. Update v2i16->v2i8 and v4i16->v4i8 with sse 4.1 to match v8i16->v8i8.	2020-04-27 13:47:48 -07:00
Craig Topper	bdbbed115f	[X86][CostModel] Update costs for vector truncate with avx512f/avx512bw. All avx512 truncate instructions except vXi64->vXi32 are 2 uops on port 5. So raise their costs to 2. Except when we have an earlier faster sequence like pshufb for 128 bit input vectors. Add a lower cost of 3 v16i16->v16i8 with avx512f where we can extend to v16i32 then truncate. And a cost of 2 for avx512bw with and without avx512vl. There we can use vpmovwb with either a ymm or zmm input. Both of these beat masking, splitting, and using packuswb which is our avx/avx2 codegen.	2020-04-27 12:00:24 -07:00
Craig Topper	5eff75d86a	[X86][CostModel] Improve costs for fp_to_uint/fp_to_sint for vXi8/vXi16/v2i32 results. Differential Revision: https://reviews.llvm.org/D78893	2020-04-27 10:35:15 -07:00
Craig Topper	fc02d9f3c6	[X86] Add cost table entry for v2i32->v2f64 fp_to_uint with avx512. We're currently getting this from the default implementation. But I don't like how the cost model came to this answer and I might be making some changes there.	2020-04-26 19:59:01 -07:00
Craig Topper	b9de62c2b6	[X86] Fix the cost of v16i1->v16i16 sext/zext on avx targets. Previously we were hitting the scalarization case in the default implementation.	2020-04-25 23:16:20 -07:00
Craig Topper	19cb26f517	[X86][CostModel] Improve costs for vXi1 sign_extend/zero_extend with avx512. With avx512 vXi1 is legal and uses k-registers with many custom cases for extending.	2020-04-25 23:16:20 -07:00
Craig Topper	7664a0d282	[X86] Improve accuracy of cost for v16i64->v16i8 truncate with avx512. The 2 vpmovqds are only 1 uop each.	2020-04-24 19:13:55 -07:00
Craig Topper	e4a9190ad7	[X86][ArgumentPromotion] Allow Argument Promotion if caller and callee disagree on 512-bit vectors support if the arguments are scalar. If one of caller/callee has disabled ZMM registers due to prefer-vector-width=256, we were previously disabling argument promotion as the ABI might be incompatible since one side will split 512-bit vectors in this case. But if we can see that the types are all scalar this shouldn't be a problem. This patch assumes that pointer element type reflects the type that the argument will be promoted to. Differential Revision: https://reviews.llvm.org/D78770	2020-04-24 15:47:02 -07:00
Sam Parker	e3056ae9a0	[NFC][TTI] Explicit use of VectorType The API for shuffles and reductions uses generic Type parameters, instead of VectorType, and so assertions and casts are used a lot. This patch makes those types explicit, which means that the clients can't be lazy, but results in less ambiguity, and that can only be a good thing. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562 Differential Revision: https://reviews.llvm.org/D78357	2020-04-20 09:16:52 +01:00
Christopher Tetreault	dd24fb388b	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: craig.topper, sdesmalen, efriedma, RKSimon Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77264	2020-04-17 10:49:16 -07:00
Craig Topper	8dfb9627b7	[X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead. This moves v32i16/v64i8 to a model consistent with how we treat integer types with avx1. This does change the ABI for types vXi16/vXi8 vectors larger than 512 bits to pass in multiple zmms instead of multiple ymms. We'd already hacked some code to make v64i8/v32i16 pass in zmm. Cost model is still a bit of a mess. In some place I tried to match existing behavior. But really we need to account for splitting and concating costs. Cost model for shuffles is especially pessimistic. Differential Revision: https://reviews.llvm.org/D76212	2020-04-15 12:17:18 -07:00
Simon Pilgrim	426f37584e	[TTI][X86] Add X86TTIImpl::getScalarizationOverhead implementation. This is a currently just a wrapper to the base type, I'll be adding ISD::BUILD_VECTOR costs in a future patch.	2020-04-14 12:58:19 +01:00
Craig Topper	2f60fbce6c	[X86] Use a more realisitic cost for truncate v16i64->v16i8 with avx512f. Still not great and we could probably codegen this better, but 11 was clearly ridiculous.	2020-04-13 21:09:43 -07:00
Craig Topper	535a566a01	[X86] Split AVX512 getCastInstrCost into tables that require useAVX512Regs() and those that just operate on 256 or smaller vectors. Use useAVX512Regs() to skip lookups instead of using type legalization action.	2020-04-13 21:09:42 -07:00
Craig Topper	071c64d68d	[X86] Add a more accurate truncate cost for v8i64->v8i8	2020-04-13 21:09:41 -07:00
Simon Pilgrim	91bc50c0d7	[CostModel][X86] Improve InsertElement costs for sub-128bit vectors If we're inserting into v2i8/v4i8/v8i8/v2i16/v4i16 style sub-128bit vectors ensure we don't use the SK_PermuteTwoSrc cost of the legalized value type - this is a followup to rG12c629ec6c59 which added equivalent sub-128bit shuffle costs	2020-04-10 14:55:46 +01:00
Craig Topper	5625e6ab37	[X86] Improve min/max reduction costs. This is similar to what I recently did for getArithmeticReductionCost. I'm trying to account for the narrowing from 512->256->128 as we go. I've also added a new helper method getMinMaxCost that tries to handle the cases where we have native min/max instructions and fall back to cmp+select when we don't. Differential Revision: https://reviews.llvm.org/D76634	2020-04-09 17:28:50 -07:00
Simon Pilgrim	12c629ec6c	[CostModel][X86] Add shuffle costs for some common sub-128bit vectors v2i8/v4i8/v8i8 + v2i16/v4i16 all show up in vectorizer code and by just using the legalized types (v16i8/v8i16) we're highly exaggerating the actual cost of the shuffle.	2020-04-09 19:57:06 +01:00
Clement Courbet	fb4aa30f27	[ExpandMemCmp] Allow overlaping loads in the zero-relational case. Summary: This allows doing `memcmp(p, q, 7)` with 2 loads instead of a call to memcmp. This fixes part of PR45147. Reviewers: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76133	2020-04-02 11:20:47 +02:00
Craig Topper	f4c67dfa92	[X86] More accurately model the cost of horizontal reductions. This patch attempts to more accurately model the reduction of power of 2 vectors of types we natively support. This takes into account the narrowing of vectors that occur as we go from 512 bits to 256 bits, to 128 bits. It also takes into account the use of wider elements in the shuffles for the first 2 steps of a reduction from 128 bits. And uses a v8i16 shift for the final step of vXi8 reduction. The default implementation uses the legalized type for the arithmetic for all levels. And uses the single source permute cost of the legalized type for all levels. This penalizes things like lack of v16i8 pshufb on pre-sse3 targets and the splitting and joining that needs to be done for integer types on AVX1. We never need v16i8 shuffle for a reduction and we only need split AVX1 ops when type the type wide and needs to be split. I think we're still over costing splits and joins for AVX1, but we're closer now. I've also removed all pairwise special casing because I don't think we ever want to generate that on X86. I've also adjusted the add handling to more accurately account for any type splitting that occurs before we reach a legal type. Differential Revision: https://reviews.llvm.org/D76478	2020-03-22 14:20:15 -07:00
Craig Topper	c13aa36bb7	[X86] Attempt to more accurately model the cost of a bool reduction of wide vector type. Previously we multiplied the cost for the table entries by the number of splits needed. But that implies that each split goes through a reduction to scalar independently. I think what really happens is that the we AND/OR the split pieces until we're down to a single value with a legal type and then do special reduction sequence on that. So to model that this patch takes the number of splits minus one multiplied by the cost of a AND/OR at the legal element count and adds that on top of the table lookup. Differential Revision: https://reviews.llvm.org/D76400	2020-03-19 09:31:05 -07:00
Craig Topper	b2da1ddaef	[X86] Add a non-zero cost for truncating v32i16->v32i8 on avx512bw.	2020-03-15 17:18:46 -07:00

1 2 3 4 5 ...

413 Commits