15806 Commits

Author SHA1 Message Date
Sanjay Patel
2e9675ff52 [InstCombine] use m_APInt to allow icmp eq (srem X, C1), C2 folds for splat constant vectors
llvm-svn: 277638
2016-08-03 19:48:40 +00:00
George Burgess IV
82e355ce48 [MSSA] Add logic for special handling of atomics/volatiles.
This patch makes MemorySSA recognize atomic/volatile loads, and makes
MSSA treat said loads specially. This allows us to be a bit more
aggressive in some cases.

Administrative note: Revision was LGTM'ed by reames in person.
Additionally, this doesn't include the `invariant.load` recognition in
the differential revision, because I feel it's better to commit that
separately. Will commit soon.

Differential Revision: https://reviews.llvm.org/D16875

llvm-svn: 277637
2016-08-03 19:39:54 +00:00
Tobias Grosser
8757e387dd [InstCombine] Refactor optimization of zext(or(icmp, icmp)) to enable more aggressive cast-folding
Summary:
InstCombine unfolds expressions of the form `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` such that in a later iteration of InstCombine the exposed `zext(icmp)` instructions can be optimized. We now combine this unfolding and the subsequent `zext(icmp)` optimization to be performed together. Since the unfolding doesn't happen separately anymore, we also again enable the folding of `logic(cast(icmp), cast(icmp))` expressions to `cast(logic(icmp, icmp))` which had been disabled due to its interference with the unfolding transformation.

Tested via `make check` and `lnt`.

Background
==========

For a better understanding on how it came to this change we subsequently summarize its history. In commit r275989 we've already tried to enable the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` which had to be reverted in r276106 because it could lead to an endless loop in InstCombine (also see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html). The root of this problem is that in `visitZExt()` in InstCombineCasts.cpp there also exists a reverse of the above folding transformation, that unfolds `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` in order to expose `zext(icmp)` operations which would then possibly be eliminated by subsequent iterations of InstCombine. However, before these `zext(icmp)` would be eliminated the folding from r275989 could kick in and cause InstCombine to endlessly switch back and forth between the folding and the unfolding transformation. This is the reason why we now combine the `zext`-unfolding and the elimination of the exposed `zext(icmp)` to happen at one go because this enables us to still allow the cast-folding in `logic(cast(icmp), cast(icmp))` without entering an endless loop again.

Details on the submitted changes
================================

- In `visitZExt()` we combine the unfolding and optimization of `zext` instructions.
- In `transformZExtICmp()` we have to use `Builder->CreateIntCast()` instead of `CastInst::CreateIntegerCast()` to make sure that the new `CastInst` is inserted in a `BasicBlock`. The new calls to `transformZExtICmp()` that we introduce in `visitZExt()` would otherwise cause according assertions to be triggered (in our case this happend, for example, with lnt for the MultiSource/Applications/sqlite3 and SingleSource/Regression/C++/EH/recursive-throw tests). The subsequent usage of `replaceInstUsesWith()` is necessary to ensure that the new `CastInst` replaces the `ZExtInst` accordingly.
- In InstCombineAndOrXor.cpp we again allow the folding of casts on `icmp` instructions.
- The instruction order in the optimized IR for the zext-or-icmp.ll test case is different with the introduced changes.
- The test cases in zext.ll have been adopted from the reverted commits r275989 and r276105.

Reviewers: grosser, majnemer, spatel

Subscribers: eli.friedman, majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D22864

Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 277635
2016-08-03 19:30:35 +00:00
Sanjay Patel
43aeb001c9 [InstCombine] use m_APInt to allow icmp (binop X, Y), C folds with constant splat vectors
This removes the restriction for the icmp constant, but as noted by the FIXME comments, 
we still need to change individual checks for binop operand constants.

llvm-svn: 277629
2016-08-03 18:59:03 +00:00
David Majnemer
fa8ef91748 [CloneFunction] Don't crash if the value map doesn't hold something
It is possible for the value map to not have an entry for some value
that has already been removed.

I don't have a testcase, this is fall-out from a buildbot.

llvm-svn: 277614
2016-08-03 17:37:10 +00:00
Sanjay Patel
51a767c6b8 use local variables; NFC
llvm-svn: 277612
2016-08-03 17:23:08 +00:00
David Majnemer
fad0490869 [CloneFunction] Don't remove side effecting calls
We were able to figure out that the result of a call is some constant.
While propagating that fact, we added the constant to the value map.
This is problematic because it results in us losing the call site when
processing the value map.

This fixes PR28802.

llvm-svn: 277611
2016-08-03 17:12:47 +00:00
Renato Golin
f583097434 Revert "Teach CorrelatedValuePropagation to mark adds as no wrap"
This reverts commit r277592, trying to fix the AArch64 42VMA buildbot.

llvm-svn: 277607
2016-08-03 16:20:48 +00:00
Gil Rapaport
e7a8fab275 [Loop Vectorizer] Move store-predication into its own function, remove obsolete comment (NFC)
Differential Revision: https://reviews.llvm.org/D23013

llvm-svn: 277595
2016-08-03 13:23:43 +00:00
Artur Pilipenko
68cb947cc9 Teach CorrelatedValuePropagation to mark adds as no wrap
Use LVI to prove that adds do not wrap. The change is motivated by https://llvm.org/bugs/show_bug.cgi?id=28620 bug and it's the first step to fix that problem.

Reviewed By: sanjoy

Differential Revision: http://reviews.llvm.org/D23059

llvm-svn: 277592
2016-08-03 13:11:39 +00:00
David Callahan
cc5cd4dc65 [ADCE] Refactor anticipating new functionality (NFC)
Summary:
This is the first refactoring before adding new functionality.
Add a class wrapper for the functions and container for
state associated with the transformation.

No functional change

Reviewers: majnemer, nadav, mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23065

llvm-svn: 277565
2016-08-03 04:28:39 +00:00
George Burgess IV
14633b5cd3 [MSSA] Fix a caching bug.
This fixes a bug where we'd sometimes cache overly-conservative results
with our walker. This bug was made more obvious by r277480, which makes
our cache far more spotty than it was. Test case is llvm-unit, because
we're likely going to use CachingWalker only for def optimization in the
future.

The bug stems from that there was a place where the walker assumed that
`DefNode.Last` was a valid target to cache to when failing to optimize
phis. This is sometimes incorrect if we have a cache hit. The fix is to
use the thing we *can* assume is a valid target to cache to. :)

llvm-svn: 277559
2016-08-03 01:22:19 +00:00
Chandler Carruth
8562d3a5e4 [Inliner] clang-format various parts of the inliner prior to changes
here. NFC.

llvm-svn: 277557
2016-08-03 01:02:31 +00:00
Ivan Krasin
3aade11252 Add -lowertypetests-bitsets-level to control bitsets generation.
Summary:
Sometimes, bitsets could get really large (>300k entries) and
we might want to drop a check, as it would have a too much cost.

Adding a flag to control how much penalty are we willing to pay
for bitsets.

Reviewers: kcc

Differential Revision: https://reviews.llvm.org/D23088

llvm-svn: 277556
2016-08-03 00:59:38 +00:00
Daniel Berlin
df10119e4e Support for lifetime begin/end markers in the MemorySSA use optimizer
Summary: Depends on D23072

Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23076

llvm-svn: 277553
2016-08-03 00:01:46 +00:00
Sanjay Patel
ab50a93888 [InstCombine] replace dyn_casts with matches; NFCI
Clean-up before changing this to allow folds for vectors.

llvm-svn: 277538
2016-08-02 22:38:33 +00:00
Piotr Padlewski
47509f6185 Imported statistics types changes
Reviewers: tejohnson, eraman

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22980

llvm-svn: 277534
2016-08-02 22:18:47 +00:00
Daniel Berlin
dff31deb1e Move to having a single real instructionClobbersQuery
Summary: We really want to move towards MemoryLocOrCall (or fix AA) everywhere, but for now, this lets us have a single instructionClobbersQuery.

Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23072

llvm-svn: 277530
2016-08-02 21:57:52 +00:00
Michael Zolotukhin
b2738e41bf [LoopUnroll] Switch the default value of -unroll-runtime-epilog back to its original value.
As agreed in post-commit review of r265388, I'm switching the flag to
its original value until the 90% runtime performance regression on
SingleSource/Benchmarks/Stanford/Bubblesort is addressed.

llvm-svn: 277524
2016-08-02 21:24:14 +00:00
Wei Mi
dc7001afb2 [LoopVectorize] Change comment for isOutOfScope in collectLoopUniforms, NFC
Update comment for isOutOfScope and add a testcase for uniform value being used
out of scope.

Differential Revision: https://reviews.llvm.org/D23073

llvm-svn: 277515
2016-08-02 20:27:49 +00:00
Daniel Berlin
26fcea91f6 Fixes for post-commit review comments on r277480
llvm-svn: 277510
2016-08-02 20:02:21 +00:00
Sanjoy Das
83a72850c7 [IRCE] Rename variable; NFC
There is nothing "Original" about "OriginalLoopInfo".

llvm-svn: 277506
2016-08-02 19:32:01 +00:00
Sanjoy Das
f45e03e201 [IRCE] Preserve DomTree and LCSSA
This changes IRCE to "preserve" LCSSA and DomTree by recomputing them.
It still does not preserve LoopSimplify.

llvm-svn: 277505
2016-08-02 19:31:54 +00:00
Michael Zolotukhin
d9b6ad3c01 [LoopUnroll] Ensure we create prolog loops in simplified form.
llvm-svn: 277502
2016-08-02 19:19:31 +00:00
Daniel Berlin
de4be65313 MSVC 2013 does not implement C++11 unions properly, so remove the anoymous union for now,
and leave a FIXME.

llvm-svn: 277485
2016-08-02 16:59:51 +00:00
Daniel Berlin
c43aa5a5b6 Rewrite the use optimizer to be less memory intensive and 50% faster.
Fixes PR28670

Summary:
Rewrite the use optimizer to be less memory intensive and 50% faster.
Fixes PR28670

The new use optimizer works like a standard SSA renaming pass, storing
all possible versions a MemorySSA use could get in a stack, and just
tracking indexes into the stack.
This uses much less memory than caching N^2 alias query results.
It's also a lot faster.

The current version defers phi node walking to the normal walker.

Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23032

llvm-svn: 277480
2016-08-02 16:24:03 +00:00
Matthew Simpson
18d8898317 [LV] Generate both scalar and vector integer induction variables
This patch enables the vectorizer to generate both scalar and vector versions
of an integer induction variable for a given loop. Previously, we only
generated a scalar induction variable if we knew all its users were going to be
scalar. Otherwise, we generated a vector induction variable. In the case of a
loop with both scalar and vector users of the induction variable, we would
generate the vector induction variable and extract scalar values from it for
the scalar users. With this patch, we now generate both versions of the
induction variable when there are both scalar and vector users and select which
version to use based on whether the user is scalar or vector.

Differential Revision: https://reviews.llvm.org/D22869

llvm-svn: 277474
2016-08-02 15:25:16 +00:00
Matthew Simpson
58f562887b [LV] Untangle the concepts of uniform and scalar
This patch refactors the logic in collectLoopUniforms and
collectValuesToIgnore, untangling the concepts of "uniform" and "scalar". It
adds isScalarAfterVectorization along side isUniformAfterVectorization to
distinguish the two. Known scalar values include those that are uniform,
getelementptr instructions that won't be vectorized, and induction variables
and induction variable update instructions whose users are all known to be
scalar.

This patch includes the following functional changes:

- In collectLoopUniforms, we mark uniform the pointer operands of interleaved
  accesses. Although non-consecutive, these pointers are treated like
  consecutive pointers during vectorization.

- In collectValuesToIgnore, we insert a value into VecValuesToIgnore if it
  isScalarAfterVectorization rather than isUniformAfterVectorization. This
  differs from the previous functionaly in that we now add getelementptr
  instructions that will not be vectorized into VecValuesToIgnore.

This patch also removes the ValuesNotWidened set used for induction variable
scalarization since, after the above changes, it is now equivalent to
isScalarAfterVectorization.

Differential Revision: https://reviews.llvm.org/D22867

llvm-svn: 277460
2016-08-02 14:29:41 +00:00
Benjamin Kramer
a0053cc0af [LoadStoreVectorizer] Don't use a linear walk for an existence check in a SmallPtrSet
No functionality change intended.

llvm-svn: 277436
2016-08-02 09:35:17 +00:00
Junmo Park
db8f6eebee Minor code cleanups. NFC.
llvm-svn: 277415
2016-08-02 04:38:27 +00:00
Sean Silva
f801575fd0 CodeExtractor : Add ability to preserve profile data.
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.

Patch by River Riddle!

Differential Revision: https://reviews.llvm.org/D22744

llvm-svn: 277411
2016-08-02 02:15:45 +00:00
Tim Shen
b44909eccb [ADT] NFC: Generalize GraphTraits requirement of "NodeType *" in interfaces to "NodeRef", and migrate SCCIterator.h to use NodeRef
Summary: By generalize the interface, users are able to inject more flexible Node token into the algorithm, for example, a pair of vector<Node>* and index integer. Currently I only migrated SCCIterator to use NodeRef, but more is coming. It's a NFC.

Reviewers: dblaikie, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22937

llvm-svn: 277399
2016-08-01 22:32:20 +00:00
Derek Schuff
c64d7655b2 [WebAssembly] Support CFI for WebAssembly target
Summary: This patch implements CFI for WebAssembly. It modifies the
LowerTypeTest pass to pre-assign table indexes to functions that are
called indirectly, and lowers type checks to test against the
appropriate table indexes. It also modifies the WebAssembly backend to
support a special ".indidx" assembly directive that propagates the table
index assignments out to the linker.

Patch by Dominic Chen

Differential Revision: https://reviews.llvm.org/D21768

llvm-svn: 277398
2016-08-01 22:25:02 +00:00
Michael Kuperstein
c40618610f [PM] Port SpeculativeExecution to the new PM
Differential Revision: https://reviews.llvm.org/D23033

llvm-svn: 277393
2016-08-01 21:48:33 +00:00
Xinliang David Li
d119761bbe [Profile] IR profiling minor cleanup /nfc
Differential Revision: http://reviews.llvm.org/D22995

llvm-svn: 277379
2016-08-01 20:25:06 +00:00
Matthew Simpson
228f973189 [LV] Move isGatherOrScatterLegal into LoopVectorizationLegality (NFC)
llvm-svn: 277376
2016-08-01 20:11:25 +00:00
Matthew Simpson
1ce88ff6a7 [LV] Use getPointerOperand helper where appropriate (NFC)
llvm-svn: 277375
2016-08-01 20:08:09 +00:00
James Molloy
bade86cedc [SimplifyCFG] Fix nasty RAUW bug from r277325
Using RAUW was wrong here; if we have a switch transform such as:
  18 -> 6 then
  6 -> 0

If we use RAUW, while performing the second transform the  *transformed* 6
from the first will be also replaced, so we end up with:
  18 -> 0
  6 -> 0

Found by clang stage2 bootstrap; testcase added.

llvm-svn: 277332
2016-08-01 09:34:48 +00:00
James Molloy
b2e436de42 [SimplifyCFG] Range reduce switches
If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example:

    switch (i) {
    case 5: ...
    case 9: ...
    case 13: ...
    case 17: ...
    }

can become:

    if ( (i - 5) % 4 ) goto default;
    switch ((i - 5) / 4) {
    case 0: ...
    case 1: ...
    case 2: ...
    case 3: ...
    }

or even better:

   switch ( ROTR(i - 5, 2) {
   case 0: ...
   case 1: ...
   case 2: ...
   case 3: ...
   }

The division and remainder operations could be costly so we only do this if the factor is a power of two, and emit a right-rotate instead of a divide/remainder sequence. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables.

llvm-svn: 277325
2016-08-01 07:45:11 +00:00
Sean Silva
423c7149dc Revert r277313 and r277314.
They seem to trigger an LSan failure:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/15140/steps/check-llvm%20asan/logs/stdio

Revert "Add the tests for r277313"

This reverts commit r277314.

Revert "CodeExtractor : Add ability to preserve profile data."

This reverts commit r277313.

llvm-svn: 277317
2016-08-01 04:16:09 +00:00
Sean Silva
a0a802abe3 Fix - CodeExtractor : Inherit Target Dependent Attributes from the parent function.
When extracting a set of blocks make sure to inherit all of the target
dependent attributes to make sure that the function will be valid for
lowering. One example is the "target-features" attribute for x86, if the
extracted region has functionality that relies on a specific feature it
will fail to be lowered.
This also allows for extracted functions to be valid for inlining, at
least back into the parent function, as the target attributes are tested
when inlining for compatibility.

Patch by River Riddle!

Differential Revision: https://reviews.llvm.org/D22713

llvm-svn: 277315
2016-08-01 03:15:32 +00:00
Sean Silva
6208924323 CodeExtractor : Add ability to preserve profile data.
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.

Patch by River Riddle!

Differential Revision: https://reviews.llvm.org/D22744

llvm-svn: 277313
2016-08-01 02:59:26 +00:00
Daniel Berlin
5130cc831a Fix the MemorySSA updating API to enable people to create memory accesses before removing old ones
llvm-svn: 277309
2016-07-31 21:08:20 +00:00
Adam Nemet
12937c361f [LoopUnroll] Include hotness of region in opt remark
LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter
is added to the common function analysis passes that loop passes
depend on.

The BFI and indirectly BPI used in this pass is computed lazily so no
overhead should be observed unless -pass-remarks-with-hotness is used.

This is how the patch affects the O3 pipeline:

         Dominator Tree Construction
         Natural Loop Information
         Canonicalize natural loops
         Loop-Closed SSA Form Pass
         Basic Alias Analysis (stateless AA impl)
         Function Alias Analysis Results
         Scalar Evolution Analysis
+        Lazy Branch Probability Analysis
+        Lazy Block Frequency Analysis
+        Optimization Remark Emitter
         Loop Pass Manager
           Rotate Loops
           Loop Invariant Code Motion
           Unswitch loops
         Simplify the CFG
         Dominator Tree Construction
         Basic Alias Analysis (stateless AA impl)
         Function Alias Analysis Results
         Combine redundant instructions
         Natural Loop Information
         Canonicalize natural loops
         Loop-Closed SSA Form Pass
         Scalar Evolution Analysis
+        Lazy Branch Probability Analysis
+        Lazy Block Frequency Analysis
+        Optimization Remark Emitter
         Loop Pass Manager
           Induction Variable Simplification
           Recognize loop idioms
           Delete dead loops
           Unroll loops
...

llvm-svn: 277203
2016-07-29 19:29:47 +00:00
Andrew Kaylor
b99d1cc7ed Recommitting r275284: add support to inline __builtin_mempcpy
Patch by Sunita Marathe

Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)

llvm-svn: 277189
2016-07-29 18:23:18 +00:00
David Majnemer
130b9f99d6 [EarlyCSE] Correctly handle simplified, but live, instructions
Some instructions may have their uses replaced with a symbolic constant.
However, the instruction may still have side effects which percludes it
from being removed from the function.  EarlyCSE treated such an
instruction as if it were removed, resulting in PR28763.

llvm-svn: 277114
2016-07-29 05:39:21 +00:00
David Majnemer
d536f2328e [ConstnatFolding] Teach the folder how to fold ConstantVector
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.

Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.

llvm-svn: 277099
2016-07-29 03:27:26 +00:00
Piotr Padlewski
84abc74f2c Added ThinLTO inlining statistics
Summary:
copypasta doc of ImportedFunctionsInliningStatistics class
 \brief Calculate and dump ThinLTO specific inliner stats.
 The main statistics are:
 (1) Number of inlined imported functions,
 (2) Number of imported functions inlined into importing module (indirect),
 (3) Number of non imported functions inlined into importing module
 (indirect).
 The difference between first and the second is that first stat counts
 all performed inlines on imported functions, but the second one only the
 functions that have been eventually inlined to a function in the importing
 module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
 possible to e.g. import function `A`, `B` and then inline `B` to `A`,
 and after this `A` might be too big to be inlined into some other function
 that calls it. It calculates this statistic by building graph, where
 the nodes are functions, and edges are performed inlines and then by marking
 the edges starting from not imported function.

 If `Verbose` is set to true, then it also dumps statistics
 per each inlined function, sorted by the greatest inlines count like
 - number of performed inlines
 - number of performed inlines to importing module

Reviewers: eraman, tejohnson, mehdi_amini

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D22491

llvm-svn: 277089
2016-07-29 00:27:16 +00:00
Evgeniy Stepanov
d240a889ad [sanitizer] Simplify and future-proof maybeMarkSanitizerLibraryCallNoBuiltin().
Sanitizers set nobuiltin attribute on certain library functions to
avoid a situation where such function is neither instrumented nor
intercepted.

At the moment the list of interesting functions is hardcoded. This
change replaces it with logic based on
TargetLibraryInfo::hasOptimizedCodegen and the presense of readnone
function attribute (sanitizers are generally interested in memory
behavior of library functions).

This is expected to be a no-op change: the new logic matches exactly
the same set of functions.

r276771 (currently reverted) added mempcpy() to the list, breaking
MSan tests. With this change, r276771 can be safely re-landed.

llvm-svn: 277086
2016-07-28 23:45:15 +00:00
Vitaly Buka
0ab23cf1c8 Do not remove empty lifetime.start/lifetime.end ranges
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.

This is possible for code like this:
for (int i = 0; i < 3; i++) {
  int x;
  p = &x;
}

"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.

PR27453

Reviewers: eugenis

Differential Revision: https://reviews.llvm.org/D22842

llvm-svn: 277072
2016-07-28 22:59:03 +00:00