Trivial multiplication by zero may survive the worklist. We tried to
reassociate the multiplication with a division instruction, causing us
to divide by zero; bail out instead.
This fixes PR24726.
llvm-svn: 246939
As a first step towards a new implementation of the base pointer inference algorithm, introduce an abstraction for BDVs, strengthen the assertions around them, and rewrite the BDV relation code in terms of the abstraction which includes an explicit notion of whether the BDV is also a base. The later is motivated by the fact we had a bug where insertelement was always assumed to be a base pointer even though the BDV code knew it wasn't. The strengthened assertions in this patch would have caught that bug.
The next step will be to separate the DefiningValueMap into a BDV use list cache (entirely within findBasePointers) and a base pointer cache. Having the former will allow me to use a deterministic visit order when visiting BDVs in the inference algorithm and remove a bunch of ordering related hacks. Before actually doing the last step, I'm likely going to extend the lattice with a 'BaseN' (seen only base inputs) state so that I can kill the post process optimization step.
Phabricator Revision: http://reviews.llvm.org/D12608
llvm-svn: 246809
The visit order being used in the base pointer inference algorithm is currently non-deterministic. When working on http://reviews.llvm.org/D12583, I discovered that we were relying on a peephole optimization to get deterministic ordering in one of the test cases.
This change is intented to let me test and land http://reviews.llvm.org/D12583. The current code will not be long lived. I'm starting to investigate a rewrite of the algorithm which will combine the post-process step into the initial algorithm and make the visit order determistic. Before doing that, I wanted to make sure the existing code was complete and the test were stable. Hopefully, patches should be up for review for the new algorithm this week or early next.
llvm-svn: 246801
Summary:
Add a `cleanupendpad` instruction, used to mark exceptional exits out of
cleanups (for languages/targets that can abort a cleanup with another
exception). The `cleanupendpad` instruction is similar to the `catchendpad`
instruction in that it is an EH pad which is the target of unwind edges in
the handler and which itself has an unwind edge to the next EH action.
The `cleanupendpad` instruction, similar to `cleanupret` has a `cleanuppad`
argument indicating which cleanup it exits. The unwind successors of a
`cleanuppad`'s `cleanupendpad`s must agree with each other and with its
`cleanupret`s.
Update WinEHPrepare (and docs/tests) to accomodate `cleanupendpad`.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12433
llvm-svn: 246751
Summary:
This patch introduces a side table in Merge Functions to
efficiently remove functions from the function set when functions
they refer to are merged. Previously these functions would need to
be compared lg(N) times to find the appropriate FunctionNode in the
tree to defer. With the recent determinism changes, this comparison
is more expensive. In addition, the removal function would not always
actually remove the function from the set (i.e. after remove(F),
there would sometimes still be a node in the tree which contains F).
With these changes, these functions are properly deferred, and so more
functions can be merged. In addition, when there are many merged
functions (and thus more deferred functions), there is a speedup:
chromium: 48678 merged -> 49380 merged; 6.58s -> 5.49s
libxul.so: 41004 merged -> 41030 merged; 8.02s -> 6.94s
mysqld: 1607 merged -> 1607 merged (same); 0.215s -> 0.212s (probably noise)
Author: jrkoenig
Reviewers: jfb, dschuff
Subscribers: llvm-commits, nlewycky
Differential revision: http://reviews.llvm.org/D12537
llvm-svn: 246735
Fix a bug in change 246133. I didn't handle the case where we had a cycle in the use graph and could add an instruction we were about to erase back on to the worklist. Oddly, I have not been able to write a small test case for this, even with the AssertingVH added. I have confirmed the basic theory for the fix on a large failing example, but all attempts to reduce that to something appropriate for a test case have failed.
Differential Revision: http://reviews.llvm.org/D12575
llvm-svn: 246718
There was infinite loop because it was trying to change assume(true) into
assume(true)
Also added handling when assume(false) appear
http://reviews.llvm.org/D12516
llvm-svn: 246697
After hitting @llvm.assume(X) we can:
- propagate equality that X == true
- if X is icmp/fcmp (with eq operation), and one of operand
is constant we can change all variables with constants in the same BasicBlock
http://reviews.llvm.org/D11918
llvm-svn: 246695
This makes RemoveDuplicatePHINodes more effective and fixes an assertion
failure. Triggering the assertions requires a DenseSet reallocation
so this change only contains a constructive test.
I'll explain the issue with a small example. In the following function
there's a duplicate PHI, %4 and %5 are identical. When this is found
the DenseSet in RemoveDuplicatePHINodes contains %2, %3 and %4.
define void @F() {
br label %1
; <label>:1 ; preds = %1, %0
%2 = phi i32 [ 42, %0 ], [ %4, %1 ]
%3 = phi i32 [ 42, %0 ], [ %5, %1 ]
%4 = phi i32 [ 42, %0 ], [ 23, %1 ]
%5 = phi i32 [ 42, %0 ], [ 23, %1 ]
br label %1
}
after RemoveDuplicatePHINodes runs the function looks like this. %3 has
changed and is now identical to %2, but RemoveDuplicatePHINodes never
saw this.
define void @F() {
br label %1
; <label>:1 ; preds = %1, %0
%2 = phi i32 [ 42, %0 ], [ %4, %1 ]
%3 = phi i32 [ 42, %0 ], [ %4, %1 ]
%4 = phi i32 [ 42, %0 ], [ 23, %1 ]
br label %1
}
If the DenseSet does a reallocation now it will reinsert all
keys and stumble over %3 now having a different hash value than it had
when inserted into the map for the first time. This change clears the
set whenever a PHI is deleted and starts the progress from the
beginning, allowing %3 to be deleted and avoiding inconsistent DenseSet
state. This potentially has a negative performance impact because
it rescans all PHIs, but I don't think that this ever makes a difference
in practice.
llvm-svn: 246694
We were bailing to two places if our runtime checks failed. If the initial overflow check failed, we'd go to ScalarPH. If any other check failed, we'd go to MiddleBlock. This caused us to have to have an extra PHI per induction and reduction as the vector loop's exit block was not dominated by its latch.
There's no need to have this behavior - if we just always go to ScalarPH we can get rid of a bunch of complexity.
llvm-svn: 246637
This reduces the complexity of createEmptyBlock() and will open the door to further refactoring.
The test change is simply because we're now constant folding a trivial test.
llvm-svn: 246634
It makes things easier to understand if this is in a helper method. This is part of my ongoing spaghetti-removal operation on createEmptyLoop.
llvm-svn: 246632
There's no need to widen canonical induction variables. It's just as efficient to create a *new*, wide, induction variable.
Consider, if we widen an indvar, then we'll have to truncate it before its uses anyway (1 trunc). If we create a new indvar instead, we'll have to truncate that instead (1 trunc) [besides which IndVars should go and clean up our mess after us anyway on principle].
This lets us remove a ton of special-casing code.
llvm-svn: 246631
Vectorized loops only ever have one induction variable. All induction PHIs from the scalar loop are rewritten to be in terms of this single indvar.
We were trying very hard to pick an indvar that already existed, even if that indvar wasn't canonical (didn't start at zero). But trying so hard is really fruitless - creating a new, canonical, indvar only results in one extra add in the worst case and that add is trivially easy to push through the PHI out of the loop by instcombine.
If we try and be less clever here and instead let instcombine clean up our mess (as we do in many other places in LV), we can remove unneeded complexity.
llvm-svn: 246630
Teach FunctionAttr to infer the nonnull attribute on return values of functions which never return a potentially null value. This is done both via a conservative local analysis for the function itself and a optimistic per-SCC analysis. If no function in the SCC returns anything which could be null (other than values from other functions in the SCC), we can conclude no function returned a null pointer. Even if some function within the SCC returns a null pointer, we may be able to locally conclude that some don't.
Differential Revision: http://reviews.llvm.org/D9688
llvm-svn: 246476
Summary:
JumpThreading shouldn't duplicate a convergent call, because that would move a convergent call into a control-inequivalent location. For example,
if (cond) {
...
} else {
...
}
convergent_call();
if (cond) {
...
} else {
...
}
should not be optimized to
if (cond) {
...
convergent_call();
...
} else {
...
convergent_call();
...
}
Test Plan: test/Transforms/JumpThreading/basic.ll
Patch by Xuetian Weng.
Reviewers: resistor, arsenm, jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12484
llvm-svn: 246415
PR24605 is caused due to an incorrect insert point in instcombine's IR
builder. When simplifying
%t = add X Y
...
%m = icmp ... %t
the replacement for %t should be placed before %t, not before %m, as
there could be a use of %t between %t and %m.
llvm-svn: 246315
Summary:
This patch removes two remaining places where pointer value comparisons
are used to order functions: comparing range annotation metadata, and comparing
block address constants. (These are both rare cases, and so no actual
non-determinism was observed from either case).
The fix for range metadata is simple: the annotation always consists of a pair
of integers, so we just order by those integers.
The fix for block addresses is more subtle. Two constants are the same if they
are the same basic block in the same function, or if they refer to corresponding
basic blocks in each respective function. Note that in the first case, merging
is trivially correct. In the second, the correctness of merging relies on the
fact that the the values of block addresses cannot be compared. This change is
actually an enhancement, as these functions could not previously be merged (see
merge-block-address.ll).
There is still a problem with cross function block addresses, in that constants
pointing to a basic block in a merged function is not updated.
This also more robustly compares floating point constants by all fields of their
semantics, and fixes a dyn_cast/cast mixup.
Author: jrkoenig
Reviewers: dschuff, nlewycky, jfb
Subscribers llvm-commits
Differential revision: http://reviews.llvm.org/D12376
llvm-svn: 246305
handle more allocas with loads past the end of the alloca.
I suspect there are some related crashers with slightly different
patterns, but I'll fix those and add test cases as I find them.
Thanks to David Majnemer for the excellent test case reduction here.
Made this super simple to debug and fix.
llvm-svn: 246289
After hitting @llvm.assume(X) we can:
- propagate equality that X == true
- if X is icmp/fcmp (with eq operation), and one of operand
is constant we can change all variables with constants in the same BasicBlock
http://reviews.llvm.org/D11918
llvm-svn: 246243
This patch changes the analysis diagnostics produced when loops with
floating-point recurrences or memory operations are identified. The new messages
say "cannot prove it is safe to reorder * operations; allow reordering by
specifying #pragma clang loop vectorize(enable)". Depending on the type of
diagnostic the message will include additional options such as ffast-math or
__restrict__.
This patch also allows the vectorize(enable) pragma to override the low pointer
memory check threshold. When the hint is given a higher threshold is used.
See the clang patch for the options produced for each diagnostic.
llvm-svn: 246187
Unlike scalar operations, we can perform vector operations on element types that
are smaller than the native integer types. We type-promote scalar operations if
they are smaller than a native type (e.g., i8 arithmetic is promoted to i32
arithmetic on Arm targets). This patch detects and removes type-promotions
within the reduction detection framework, enabling the vectorization of small
size reductions.
In the legality phase, we look through the ANDs and extensions that InstCombine
creates during promotion, keeping track of the smaller type. In the
profitability phase, we use the smaller type and ignore the ANDs and extensions
in the cost model. Finally, in the code generation phase, we truncate the result
of the reduction to allow InstCombine to rewrite the entire expression in the
smaller type.
This fixes PR21369.
http://reviews.llvm.org/D12202
Patch by Matt Simpson <mssimpso@codeaurora.org>!
llvm-svn: 246149
... and move it into LoopUtils where it can be used by other passes, just like ReductionDescriptor. The API is very similar to ReductionDescriptor - that is, not very nice at all. Sorting these both out will come in a followup.
NFC
llvm-svn: 246145
A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-store forwarding across a release fence.
We do need to make sure that stores before the fence can't be eliminated even if there's another store to the same location after the fence. In theory, we could reorder the second store above the fence and *then* eliminate the former, but we can't do this if the stores are on opposite sides of the fence.
Note: While more aggressive then what's there, this patch is still implementing a really conservative ordering. In particular, I'm not trying to exploit undefined behavior via races, or the fact that the LangRef says only 'atomic' accesses are ordered w.r.t. fences.
Differential Revision: http://reviews.llvm.org/D11434
llvm-svn: 246134
When computing base pointers, we introduce new instructions to propagate the base of existing instructions which might not be bases. However, the algorithm doesn't make any effort to recognize when the new instruction to be inserted is the same as an existing one already in the IR. Since this is happening immediately before rewriting, we don't really have a chance to fix it after the pass runs without teaching loop passes about statepoints.
I'm really not thrilled with this patch. I've rewritten it 4 different ways now, but this is the best I've come up with. The case where the new instruction is just the original base defining value could be merged into the existing algorithm with some complexity. The problem is that we might have something like an extractelement from a phi of two vectors. It may be trivially obvious that the base of the 0th element is an existing instruction, but I can't see how to make the algorithm itself figure that out. Thus, I resort to the call to SimplifyInstruction instead.
Note that we can only adjust the instructions we've inserted ourselves. The live sets are still being tracked in side structures at this point in the code. We can't easily muck with instructions which might be in them. Long term, I'm really thinking we need to materialize the live pointer sets explicitly in the IR somehow rather than using side structures to track them.
Differential Revision: http://reviews.llvm.org/D12004
llvm-svn: 246133
This patch ensures that every analysis diagnostic produced by the vectorizer
will be printed if the loop has a vectorization hint on it. The condition has
also been improved to prevent printing when a disabling hint is specified.
llvm-svn: 246132
As Sanjoy pointed out over in http://reviews.llvm.org/D11819, a switch on an icmp should always be able to become a branch instruction. This patch generalizes that notion slightly to prove that the default case of a switch is unreachable if the cases completely cover all possible bit patterns in the condition. Once that's done, the switch to branch conversion kicks in just fine.
Note: Duplicate case values are disallowed by the LangRef and verifier.
Differential Revision: http://reviews.llvm.org/D11995
llvm-svn: 246125