llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-05-09 16:36:07 +00:00

Author	SHA1	Message	Date
Chad Rosier	a097bc69df	[LV] Use Demanded Bits and ValueTracking for reduction type-shrinking The type-shrinking logic in reduction detection, although narrow in scope, is also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies the approach to rely on the demanded bits and value tracking analyses, if available. We currently perform type-shrinking separately for reductions and other instructions in the loop. Long-term, we should probably think about computing minimal bit widths in a more complete way for the loops we want to vectorize. PR35734 Differential Revision: https://reviews.llvm.org/D42309 llvm-svn: 324195	2018-02-04 15:42:24 +00:00
Hiroshi Inoue	0909ca132f	[NFC] fix trivial typos in comments and documents "in in" -> "in", "on on" -> "on" etc. llvm-svn: 323508	2018-01-26 08:15:29 +00:00
Andrei Elovikov	7457aa0bce	[LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from a trunc. Summary: This method is supposed to be called for IVs that have casts in their use-def chains that are completely ignored after vectorization under PSE. However, for truncates of such IVs the same InductionDescriptor is used during creation/widening of both original IV based on PHINode and new IV based on TruncInst. This leads to unintended second call to recordVectorLoopValueForInductionCast with a VectorLoopVal set to the newly created IV for a trunc and causes an assert due to attempt to store new information for already existing entry in the map. This is wrong and should not be done. Fixes PR35773. Reviewers: dorit, Ayal, mssimpso Reviewed By: dorit Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41913 llvm-svn: 322473	2018-01-15 10:56:07 +00:00
Hal Finkel	0f1314c5ee	[LV][VPlan] NFC patch to move LoopVectorizationPlanner class out of LoopVectorize.cpp Another small step forward to move VPlan stuff outside of LoopVectorize.cpp. VPlanBuilder.h is renamed to LoopVectorizationPlanner.h LoopVectorizationPlanner class is moved from LoopVectorize.cpp to LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor class is moved to LoopVectorizationPlanner.h (used by the planner class) --- this needs further streamlining work in later patches and thus all I did was take it out of the CostModel class and moved to the header file. The callback function had to stay inside LoopVectorize.cpp since it calls an InnerLoopVectorizer member function declared in it. Next Steps: Make InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular and more aligned with VPlan direction, in small increments. Previous step was: r320900 (https://reviews.llvm.org/D41045) Patch by Hideki Saito, thanks! Differential Revision: https://reviews.llvm.org/D41420 llvm-svn: 321962	2018-01-07 16:02:58 +00:00
Benjamin Kramer	c7fc81e659	Use phi ranges to simplify code. No functionality change intended. llvm-svn: 321585	2017-12-30 15:27:33 +00:00
Florian Hahn	467abe3e4f	[LV] Remove unnecessary DoExtraAnalysis guard (silent bug) canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true. This doesn't make sense to me because reporting analysis information shouldn't alter legality checks. This is probably the result of a last minute minor change before committing (?). Patch by Diego Caballero. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D40973 llvm-svn: 321172	2017-12-20 13:28:38 +00:00
Hal Finkel	5444f40965	[LV] Extend InstWidening with CM_Widen_Recursive Changes to the original scalar loop during LV code gen cause the return value of Legal->isConsecutivePtr() to be inconsistent with the return value during legal/cost phases (further analysis and information of the bug is in D39346). This patch is an alternative fix to PR34965 following the CM_Widen approach proposed by Ayal and Gil in D39346. It extends InstWidening enum with CM_Widen_Reverse to properly record the widening decision for consecutive reverse memory accesses and, consequently, get rid of the Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner solution to PR34965 than the one in D39346. Fixes PR34965. Patch by Diego Caballero, thanks! Differential Revision: https://reviews.llvm.org/D40742 llvm-svn: 320913	2017-12-16 02:55:24 +00:00
Hal Finkel	7333aa9f16	[LV] NFC patch for moving VPRecipe class definitions from LoopVectorize.cpp to VPlan.h This is a small step forward to move VPlan stuff to where it should belong (i.e., VPlan.): 1. VPRecipe classes in LoopVectorize.cpp are moved to VPlan.h. 2. Many of VPRecipe::print() and execute() definitions are still left in LoopVectorize.cpp since they refer to things declared in LoopVectorize.cpp. To be moved to VPlan.cpp at a later time. 3. InterleaveGroup class is moved from anonymous namespace to llvm namespace. Referencing it in anonymous namespace from VPlan.h ended up in warning. Patch by Hideki Saito, thanks! Differential Revision: https://reviews.llvm.org/D41045 llvm-svn: 320900	2017-12-16 01:12:50 +00:00
Dorit Nuzman	4750c785b3	[LV] Support efficient vectorization of an induction with redundant casts D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose update chain involves casts; PSCEV can now build an AddRecurrence for some forms of such phi nodes, under the proper runtime overflow test. This means that we can identify such phi nodes as an induction, and the loop-vectorizer can now vectorize such inductions, however inefficiently. The vectorizer doesn't know that it can ignore the casts, and so it vectorizes them. This patch records the casts in the InductionDescriptor, so that they could be marked to be ignored for cost calculation (we use VecValuesToIgnore for that) and ignored for vectorization/widening/scalarization (i.e. treated as TriviallyDead). In addition to marking all these casts to be ignored, we also need to make sure that each cast is mapped to the right vector value in the vector loop body (be it a widened, vectorized, or scalarized induction). So whenever an induction phi is mapped to a vector value (during vectorization/widening/ scalarization), we also map the respective cast instruction (if exists) to that vector value. (If the phi-update sequence of an induction involves more than one cast, then the above mapping to vector value is relevant only for the last cast of the sequence as we allow only the "last cast" to be used outside the induction update chain itself). This is the last step in addressing PR30654. llvm-svn: 320672	2017-12-14 07:56:31 +00:00
Dorit Nuzman	927b31600e	[LV] Ignore the cost of values that will not appear in the vectorized loop VecValuesToIgnore holds values that will not appear in the vectorized loop. We should therefore ignore their cost when VF > 1. Differential Revision: https://reviews.llvm.org/D40883 llvm-svn: 320463	2017-12-12 08:57:43 +00:00
Adam Nemet	a502ee73c4	[LV] Interleaved access vectorization: fix computing new alias info As a new access is generated spanning across multiple fields, we need to propagate alias info from all the fields to form the most generic alias info. rdar://35602528 Differential Revision: https://reviews.llvm.org/D40617 llvm-svn: 319979	2017-12-06 22:42:24 +00:00
Alina Sbirlea	ff8b8aea2e	Add MemorySSA as loop dependency, disabled by default [NFC]. Summary: First step in adding MemorySSA as dependency for loop pass manager. Adding the dependency under a flag. New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null. Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default. Reviewers: sanjoy, davide, gberry Subscribers: mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D40274 llvm-svn: 318772	2017-11-21 15:45:46 +00:00
Gil Rapaport	8b9d1f3c5b	[LV] Model masking in VPlan, introducing VPInstructions This patch adds a new abstraction layer to VPlan and leverages it to model the planned instructions that manipulate masks (AND, OR, NOT), introduced during predication. The new VPValue and VPUser classes model how data flows into, through and out of a VPlan, forming the vertices of a planned Def-Use graph. The new VPInstruction class is a generic single-instruction Recipe that models a planned instruction along with its opcode, operands and users. See VectorizationPlan.rst for more details. Differential Revision: https://reviews.llvm.org/D38676 llvm-svn: 318645	2017-11-20 12:01:47 +00:00
Gil Rapaport	848581cadb	[LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipe This patch is part of D38676. The patch introduces two new Recipes to handle instructions whose vectorization involves masking. These Recipes take VPlan-level masks in D38676, but still rely on ILV's existing createEdgeMask(), createBlockInMask() in this patch. VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(), which now handles only loop-header phi nodes. VPWidenMemoryInstructionRecipe handles load/store which are to be widened (but are not part of an Interleave Group). In this patch it simply calls ILV::vectorizeMemoryInstruction on execute(). Differential Revision: https://reviews.llvm.org/D39068 llvm-svn: 318149	2017-11-14 12:09:30 +00:00
Dan Gohman	2c74fe977d	Add an @llvm.sideeffect intrinsic This patch implements Chandler's idea [0] for supporting languages that require support for infinite loops with side effects, such as Rust, providing part of a solution to bug 965 [1]. Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual effect, but which appears to optimization passes to have obscure side effects, such that they don't optimize away loops containing it. It also teaches several optimization passes to ignore this intrinsic, so that it doesn't significantly impact optimization in most cases. As discussed on llvm-dev [2], this patch is the first of two major parts. The second part, to change LLVM's semantics to have defined behavior on infinite loops by default, with a function attribute for opting into potential-undefined-behavior, will be implemented and posted for review in a separate patch. [0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html [1] https://bugs.llvm.org/show_bug.cgi?id=965 [2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html Differential Revision: https://reviews.llvm.org/D38336 llvm-svn: 317729	2017-11-08 21:59:51 +00:00
Sanjay Patel	629c411538	[IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html and again more recently: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html ...this is a step in cleaning up our fast-math-flags implementation in IR to better match the capabilities of both clang's user-visible flags and the backend's flags for SDNode. As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'. We're also adding a bit to allow approximations for library functions called 'ApproxFunc' (this was initially proposed as 'libm' or similar). ...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day. We keep the 'fast' keyword. I thought about removing that, but seeing IR like this: %f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2 ...made me think we want to keep the shortcut synonym. Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. This statement: "Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility ) ...provides the flexibility we want to make this change without requiring a new IR version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'. Note: an inter-dependent clang commit to use the new API name should closely follow commit. Differential Revision: https://reviews.llvm.org/D39304 llvm-svn: 317488	2017-11-06 16:27:15 +00:00
Benjamin Kramer	3f3d5be759	[LoopVectorize] Replace manual VPlan memory management with unique_ptr. No functionality change intended. llvm-svn: 317003	2017-10-31 14:58:22 +00:00
Dehao Chen	ed2d5402cb	Do not add discriminator encoding for debug intrinsics. Summary: There are certain requirements for debug location of debug intrinsics, e.g. the scope of the DILocalVariable should be the same as the scope of its debug location. As a result, we should not add discriminator encoding for debug intrinsics. Reviewers: dblaikie, aprantl Reviewed By: aprantl Subscribers: JDevlieghere, aprantl, bjope, sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D39343 llvm-svn: 316703	2017-10-26 21:20:52 +00:00
Eugene Zelenko	5323550e9a	[Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 315640	2017-10-12 23:30:03 +00:00
Vivek Pandya	9590658fb8	[NFC] Convert OptimizationRemarkEmitter old emit() calls to new closure parameterized emit() calls Summary: This is not functional change to adopt new emit() API added in r313691. Reviewed By: anemet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38285 llvm-svn: 315476	2017-10-11 17:12:59 +00:00
Ayal Zaks	c9e0f886e5	[LV] Fix PR34743 - handle casts that sink after interleaved loads When ignoring a load that participates in an interleaved group, make sure to move a cast that needs to sink after it. Testcase derived from reproducer of PR34743. Differential Revision: https://reviews.llvm.org/D38338 llvm-svn: 314986	2017-10-05 15:45:14 +00:00
Ayal Zaks	fc3f7a4f0c	[LV] Fix PR34711 - widen instruction ranges when sinking casts Instead of trying to keep LastWidenRecipe updated after creating each recipe, have tryToWiden() retrieve the last recipe of the current VPBasicBlock and check if it's a VPWidenRecipe when attempting to extend its range. This ensures that such extensions, optimized to maintain the original instruction order, do so only when the instructions are to maintain their relative order. The latter does not always hold, e.g., when a cast needs to sink to unravel first order recurrence (r306884). Testcase derived from reproducer of PR34711. Differential Revision: https://reviews.llvm.org/D38339 llvm-svn: 314981	2017-10-05 12:41:49 +00:00
Matthew Simpson	f4bb480b62	[LV] Use correct insertion point when type shrinking reductions When type shrinking reductions, we should insert the truncations and extends at the end of the loop latch block. Previously, these instructions were inserted at the end of the loop header block. The difference is only a problem for loops with predicated instructions (e.g., conditional stores and instructions that may divide by zero). For these instructions, we create new basic blocks inside the vectorized loop, which cause the loop header and latch to no longer be the same block. This should fix PR34687. Reference: https://bugs.llvm.org/show_bug.cgi?id=34687 llvm-svn: 314542	2017-09-29 18:07:39 +00:00
Sanjoy Das	def1729dc4	Use a BumpPtrAllocator for Loop objects Summary: And now that we no longer have to explicitly free() the Loop instances, we can (with more ease) use the destructor of LoopBase to do what LoopBase::clear() was doing. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38201 llvm-svn: 314375	2017-09-28 02:45:42 +00:00
Adam Nemet	15fccf0009	Allow ORE.emit to take a closure to delay building the remark object In the lambda we are now returning the remark by value so we need to preserve its type in the insertion operator. This requires making the insertion operator generic. I've also converted a few cases to use the new API. It seems to work pretty well. See the LoopUnroller for a slightly more interesting case. llvm-svn: 313691	2017-09-19 23:00:55 +00:00
Vivek Pandya	b5ab895e2a	This patch fixes https://bugs.llvm.org/show_bug.cgi?id=32352 It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with command line options. The diagnostic handler used to be callback now this patch adds a class DiagnosticHandler. It has virtual method to provide custom diagnostic handler and methods to control which particular remarks are enabled. However LLVM-C API users can still provide callback function for diagnostic handler. llvm-svn: 313390	2017-09-15 20:10:09 +00:00
Vivek Pandya	df8598dcc4	This reverts r313381 llvm-svn: 313387	2017-09-15 19:53:54 +00:00
Vivek Pandya	00d887447b	This patch fixes https://bugs.llvm.org/show_bug.cgi?id=32352 It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with command line options. The diagnostic handler used to be callback now this patch adds a class DiagnosticHandler. It has virtual method to provide custom diagnostic handler and methods to control which particular remarks are enabled. However LLVM-C API users can still provide callback function for diagnostic handler. llvm-svn: 313382	2017-09-15 19:30:59 +00:00
Alon Kom	682cfc1d4c	[LV] Fix maximum legal VF calculation This patch fixes pr34283, which exposed that the computation of maximum legal width for vectorization was wrong, because it relied on MaxInterleaveFactor to obtain the maximum stride used in the loop, however not all strided accesses in the loop have an interleave-group associated with them. Instead of recording the maximum stride in the loop, which can be over conservative (e.g. if the access with the maximum stride is not involved in the dependence limitation), this patch tracks the actual maximum legal width imposed by accesses that are involved in dependencies. Differential Revision: https://reviews.llvm.org/D37507 llvm-svn: 313237	2017-09-14 07:40:02 +00:00
Anna Thomas	19529f75b9	[LV] Avoid computing the register usage for default VF. NFC These are changes to reduce redundant computations when calculating a feasible vectorization factor: 1. early return when target has no vector registers 2. don't compute register usage for the default VF. Suggested during review for D37702. llvm-svn: 313176	2017-09-13 19:35:45 +00:00
Ayal Zaks	e2a8c0758f	[LV] Fix PR34523 - avoid generating redundant selects When converting a PHI into a series of 'select' instructions to combine the incoming values together according their edge masks, initialize the first value to the incoming value In0 of the first predecessor, instead of generating a redundant assignment 'select(Cond[0], In0, In0)'. The latter fails when the Cond[0] mask is null, representing a full mask, which can happen only when there's a single incoming value. No functional changes intended nor expected other than surviving null Cond[0]'s. This fix follows D35725, which introduced using null to represent full masks. Differential Revision: https://reviews.llvm.org/D37619 llvm-svn: 313119	2017-09-13 06:28:37 +00:00
Anna Thomas	9f1be02fa3	[LV] Clamp the VF to the trip count Summary: When the MaxVectorSize > ConstantTripCount, we should just clamp the vectorization factor to be the ConstantTripCount. This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF. Earlier we were finding the maximum vector width, which could be greater than the trip count itself. The Loop vectorizer does all the work for generating a vectorizable loop, but in the end we would always choose the scalar loop (since the VF > trip count). This allows us to choose the VF keeping in mind the trip count if available. This is a fix on top of rL312472. Reviewers: Ayal, zvi, hfinkel, dneilson Reviewed by: Ayal Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37702 llvm-svn: 313046	2017-09-12 16:32:45 +00:00
Zvi Rackover	9a087a357a	LoopVectorize: MaxVF should not be larger than the loop trip count Summary: Improve how MaxVF is computed while taking into account that MaxVF should not be larger than the loop's trip count. Other than saving on compile-time by pruning the possible MaxVF candidates, this patch fixes pr34438 which exposed the following flow: 1. Short trip count identified -> Don't bail out, set OptForSize:=True to avoid tail-loop and runtime checks. 2. Compute MaxVF returned 16 on a target supporting AVX512. 3. OptForSize -> choose VF:=MaxVF. 4. Bail out because TripCount = 8, VF = 16, TripCount % VF !=0 means we need a tail loop. With this patch step 2. will choose MaxVF=8 based on TripCount. Reviewers: Ayal, dorit, mkuper, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D37425 llvm-svn: 312472	2017-09-04 08:35:13 +00:00
Benjamin Kramer	14ddcdfb18	[LoopVectorize] Turn static DenseSet into switch. LLVM transforms this into a bit test which is a lot faster and smaller. llvm-svn: 312417	2017-09-02 16:41:55 +00:00
Manoj Gupta	6b54c7e11b	[LoopVectorizer] Use two step casting for float to pointer types. Summary: LoopVectorizer is creating casts between vec<ptr> and vec<float> types on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a floating point type to a pointer type even if the types have same size causing a crash. Fix the crash using a two-step casting by bitcasting to integer and integer to pointer/float. Fixes PR33804. Reviewers: mkuper, Ayal, dlj, rengolin, srhines Reviewed By: rengolin Subscribers: aemerson, kristof.beyls, mkazantsev, Meinersbur, rengolin, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D35498 llvm-svn: 312331	2017-09-01 15:36:00 +00:00
Ayal Zaks	1f58dda4e4	[LV] Fix PR34248 - recommit D32871 after revert r311304 Original commit r311077 of D32871 was reverted in r311304 due to failures reported in PR34248. This recommit fixes PR34248 by restricting the packing of predicated scalars into vectors only when vectorizing, avoiding doing so when unrolling w/o vectorizing. Added a test derived from the reproducer of PR34248. llvm-svn: 311849	2017-08-27 12:55:46 +00:00
Chandler Carruth	bd6dc14230	Revert r311077: [LV] Using VPlan ... This causes LLVM to assert fail on PPC64 and crash / infloop in other cases. Filed http://llvm.org/PR34248 with reproducer attached. llvm-svn: 311304	2017-08-20 23:17:11 +00:00
Aditya Kumar	a525fffd07	[Loop Vectorize] Added a separate metadata Added a separate metadata to indicate when the loop has already been vectorized instead of setting width and count to 1. Patch written by Divya Shanmughan and Aditya Kumar Differential Revision: https://reviews.llvm.org/D36220 llvm-svn: 311281	2017-08-20 10:32:41 +00:00
Ayal Zaks	6627883369	[LV] Using VPlan to model the vectorized code and drive its transformation VPlan is an ongoing effort to refactor and extend the Loop Vectorizer. This patch introduces the VPlan model into LV and uses it to represent the vectorized code and drive the generation of vectorized IR. In this patch VPlan models the vectorized loop body: the vectorized control-flow is represented using VPlan's Hierarchical CFG, with predication refactored from being a post-vectorization-step into a vectorization planning step modeling if-then VPRegionBlocks, and generating code inline with non-predicated code. The vectorized code within each VPBasicBlock is represented as a sequence of Recipes, each responsible for modelling and generating a sequence of IR instructions. To keep the size of this commit manageable the Recipes in this patch are coarse-grained and capture large chunks of LV's code-generation logic. The constructed VPlans are dumped in dot format under -debug. This commit retains current vectorizer output, except for minor instruction reorderings; see associated modifications to lit tests. For further details on the VPlan model see docs/Proposals/VectorizationPlan.rst and its references. Authors: Gil Rapaport and Ayal Zaks Differential Revision: https://reviews.llvm.org/D32871 llvm-svn: 311077	2017-08-17 09:29:59 +00:00
Ayal Zaks	25e2800e20	[LV] Minor savings to Sink casts to unravel first order recurrence Two minor savings: avoid copying the SinkAfter map and avoid moving a cast if it is not needed. Differential Revision: https://reviews.llvm.org/D36408 llvm-svn: 310910	2017-08-15 08:32:59 +00:00
Anna Thomas	9b6e12f3dc	[LoopVectorize] Fix assertion failure in Fcmp vectorization Summary: When vectorizing fcmps we can trip on incorrect cast assertion when setting the FastMathFlags after generating the vectorized FCmp. This can happen if the FCmp can be folded to true or false directly. The fix here is to set the FastMathFlag using the FastMathFlagBuilder before creating the FCmp Instruction. This is what's done by other optimizations such as InstCombine. Added a test case which trips on cast assertion without this patch. Reviewers: Ayal, mssimpso, mkuper, gilr Reviewed by: Ayal, mssimpso Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D36244 llvm-svn: 310389	2017-08-08 18:07:44 +00:00
Matt Arsenault	e6de494b74	LV: Don't insert runtime ptr checks on divergent targets llvm-svn: 309890	2017-08-02 21:43:08 +00:00
Ayal Zaks	e841b214b1	[LV] Avoid redundant operations manipulating masks The Loop Vectorizer generates redundant operations when manipulating masks: AND with true, OR with false, compare equal to true. Instead of relying on a subsequent pass to clean them up, this patch avoids generating them. Use null (no-mask) to represent all-one full masks, instead of a constant all-one vector, following the convention of masked gathers and scatters. Preparing for a follow-up VPlan patch in which these mask manipulating operations are modeled using recipes. Differential Revision: https://reviews.llvm.org/D35725 llvm-svn: 309558	2017-07-31 13:21:42 +00:00
Ayal Zaks	8c452d76ed	[LV] Test once if vector trip count is zero, instead of twice Generate a single test to decide if there are enough iterations to jump to the vectorized loop, or else go to the scalar remainder loop. This test compares the Scalar Trip Count: if STC < VF * UF go to the scalar loop. If requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the rest can be used to form vector iterations. So in this case the test checks instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the scalar loop if so. Otherwise the vector loop is entered for at-least one vector iteration. This test covers the case where incrementing the backedge-taken count will overflow leading to an incorrect trip count of zero. In this (rare) case we will also avoid the vector loop and jump to the scalar loop. This patch simplifies the existing tests and effectively removes the basic-block originally named "min.iters.checked", leaving the single test in block "vector.ph". Original observation and initial patch by Evgeny Stupachenko. Differential Revision: https://reviews.llvm.org/D34150 llvm-svn: 308421	2017-07-19 05:16:39 +00:00
Michael Kuperstein	fdb46b2fb4	[LV] Don't allow outside uses of IVs if the SCEV is predicated on loop conditions. This fixes PR33706. Differential Revision: https://reviews.llvm.org/D35227 llvm-svn: 307837	2017-07-12 19:53:55 +00:00
Teresa Johnson	c12306c0ad	Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306936	2017-07-01 03:24:09 +00:00
Teresa Johnson	eb4fba9d61	re-commit r306336: Enable vectorizer-maximize-bandwidth by default. Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306935	2017-07-01 03:24:08 +00:00
Teresa Johnson	de56903bde	revert r306336 for breaking ppc test. llvm-svn: 306934	2017-07-01 03:24:07 +00:00
Teresa Johnson	1fbaffeba1	Enable vectorizer-maximize-bandwidth by default. Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306933	2017-07-01 03:24:06 +00:00
Ayal Zaks	2ff59d4350	[LV] Sink casts to unravel first order recurrence Check if a single cast is preventing handling a first-order-recurrence Phi, because the scheduling constraints it imposes on the first-order-recurrence shuffle are infeasible; but they can be made feasible by moving the cast downwards. Record such casts and move them when vectorizing the loop. Differential Revision: https://reviews.llvm.org/D33058 llvm-svn: 306884	2017-06-30 21:05:06 +00:00

1 2 3 4 5 ...

842 Commits