This is fixing a bug where Loop Vectorization is widening a load but
with a lower alignment. Hoisting the load without propagating the alignment
will allow inst-combine to later deduce a higher alignment that what the pointer
actually is.
Differential Revision: https://reviews.llvm.org/D28408
llvm-svn: 291281
order to avoid jumpy line tables. Calls are left alone because they may be inlined.
Differential Revision: https://reviews.llvm.org/D28390
llvm-svn: 291258
This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.
Differential Revision: https://reviews.llvm.org/D28341
llvm-svn: 291205
Summary:
Using the linker-supplied list of "preserved" symbols, we can compute
the list of "dead" symbols, i.e. the one that are not reachable from
a "preserved" symbol transitively on the reference graph.
Right now we are using this information to mark these functions as
non-eligible for import.
The impact is two folds:
- Reduction of compile time: we don't import these functions anywhere
or import the function these symbols are calling.
- The limited number of import/export leads to better internalization.
Patch originally by Mehdi Amini.
Reviewers: mehdi_amini, pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23488
llvm-svn: 291177
Promotion is always legal when a store within the loop is guaranteed to execute.
However, this is not a necessary condition - for promotion to be memory model
semantics-preserving, it is enough to have a store that dominates every exit
block. This is because if the store dominates every exit block, the fact the
exit block was executed implies the original store was executed as well.
Differential Revision: https://reviews.llvm.org/D28147
llvm-svn: 291171
Summary:
Preheader instruction's operands will always be invariant w.r.t. the loop which its the preheader
for.
Memory aliases are handled in canSinkOrHoistInst.
Reviewers: danielcdh, davidxl
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D28270
llvm-svn: 291132
Summary:
This adds a new summary flag NotEligibleToImport that subsumes
several existing flags (NoRename, HasInlineAsmMaybeReferencingInternal
and IsNotViableToInline). It also subsumes the checking of references
on the summary that was being done during the thin link by
eligibleForImport() for each candidate. It is much more efficient to
do that checking once during the per-module summary build and record
it in the summary.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28169
llvm-svn: 291108
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.
Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.
Differential Revision: https://reviews.llvm.org/D27518
llvm-svn: 291106
Set up basic YAML I/O support for module summaries, plumb the summary into
the pass and add a few command line flags to test YAML I/O support. Bitcode
support to come separately, as will the code in LowerTypeTests that actually
uses the summary. Also add a couple of tests that pass by virtue of the pass
doing nothing with the summary (which happens to be the correct thing to do
for those tests).
Differential Revision: https://reviews.llvm.org/D28041
llvm-svn: 291069
performing partial redundancy elimination (PRE). Not doing so can cause jumpy line
tables and confusing (though correct) source attributions.
Differential Revision: https://reviews.llvm.org/D27857
llvm-svn: 291037
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.
Two main current limitations are:
1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.
These two limitations can be lifted in a subsequent improvement if
needed.
With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).
Reviewers: pcc, tejohnson, dexonsmith
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28113
llvm-svn: 291027
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer). The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).
Original commit message:
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Original review: https://reviews.llvm.org/D27590
llvm-svn: 290973
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))
This is only possible if C2 is negative and C2 is greater than or equal to negative C1.
llvm-svn: 290927
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.
So just in case we decide this is the right way, we might as well
have a cleaner implementation.
llvm-svn: 290912
Summary:
Regardless how the loop body weight is distributed, we should preserve
total loop body weight. i.e. we should have same weight reaching the body of the loop
or its duplicates in peeled and unpeeled case.
Reviewers: mkuper, davidxl, anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28179
llvm-svn: 290833
Apparently my suggestion of using ternary doesn't really work
as clang complains about incompatible types on LHS and RHS. Some
GCC versions happen to accept the code but clang behaviour is
correct here.
llvm-svn: 290822
Summary:
This avoids the very fragile code for null expressions. We could also use a denseset that tracks which things have null expressions instead, but that seems pretty fragile and premature optimization.
This resolves a number of infinite loop cases, test reductions coming.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28193
llvm-svn: 290816
Summary: Previously, we tried to fix up the equivalences during symbolic evaluation. This does not work. Now, we change the equivalences during congruence finding, where it belongs. We also initialize the equivalence table to give a maximal answer.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28192
llvm-svn: 290815
CVP doesn't care about the order of blocks visited, but by using a pre-order traversal over the graph we can a) not visit unreachable blocks and b) optimize as we go so that analysis of later blocks produce slightly more precise results.
I noticed this via inspection and don't have a concrete example which points to the issue.
llvm-svn: 290760
This is similar to the allocfn case - if an alloca is not captured, then it's
necessarily thread-local.
Differential Revision: https://reviews.llvm.org/D28170
llvm-svn: 290738
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.
The patch also simplified the code by reducing one parameter in UP.
The patch only affects code-gen of two speccpu2006 benchmark:
445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.
Reviewers: mzolotukhin, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26989
llvm-svn: 290737
"Changed" doesn't actually change within the loop, so there's
no reason to keep track of it - we always return false during
analysis and true after the transformation is made.
llvm-svn: 290735
We correctly canonicalized (add (sext x), (sext y)) to (sext (add x, y))
where possible. However, we didn't perform the same canonicalization
for zexts or for muls.
llvm-svn: 290733
This moves the exit block and insertion point computation to be eager,
instead of after seeing the first scalar we can promote.
The cost is relatively small (the computation happens anyway, see discussion
on D28147), and the code is easier to follow, and can bail out earlier
if there's a catchswitch present.
llvm-svn: 290729
We would check whether we have a prehader *or* dedicated exit blocks,
and go into the promotion loop. Then, for each alias set we'd check
if we have a preheader *and* dedicated exit blocks, and bail if not.
Instead, bail immediately if we don't have both.
llvm-svn: 290728