Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
Summary:
I want to separate out the notions of invariance and dereferenceability
at the MI level, so that they correspond to the equivalent concepts at
the IR level. (Currently an MI load is MI-invariant iff it's
IR-invariant and IR-dereferenceable.)
First step is renaming this function.
Reviewers: chandlerc
Subscribers: MatzeB, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D23370
llvm-svn: 281125
Move the target specific setup into the target specific lowering setup. As
pointed out by Anton, the initial change was moving this too high up the stack
resulting in a violation of the layering (the target generic code path setup
target specific bits). Sink this into the ARM specific setup. NFC.
llvm-svn: 281088
This writes the full sequence of type records described in
Yaml to the TPI stream of the PDB file.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D24316
llvm-svn: 281063
This can happen when the frontend knows the debug info will be emitted
somewhere else. Usually this happens for dynamic classes with out of
line constructors or key functions, but it can also happen when modules
are enabled.
llvm-svn: 281060
These instructions were only necessary when type information was stored in the
MachineInstr (because only generic MachineInstrs possessed a type). Now that
it's in MachineRegisterInfo, COPY and PHI work fine.
llvm-svn: 281037
We want each register to have a canonical type, which means the best place to
store this is in MachineRegisterInfo rather than on every MachineInstr that
happens to use or define that register.
Most changes following from this are pretty simple (you need an MRI anyway if
you're going to be doing any transformations, so just check the type there).
But legalization doesn't really want to check redundant operands (when, for
example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's
operand type field to encode these constraints and limit legalization's work.
As an added bonus, more validation is possible, both in MachineVerifier and
MachineIRBuilder (coming soon).
llvm-svn: 281035
mapping a yaml field to an object in code has always been
a stateless operation. You could still pass state by using the
`setContext` function of the YAMLIO object, but this represented
global state for the entire yaml input. In order to have
context-sensitive state, it is necessary to pass this state in
at the granularity of an individual mapping.
This patch adds support for this type of context-sensitive state.
You simply pass an additional argument of type T to the
`mapRequired` or `mapOptional` functions, and provided you have
specialized a `MappingContextTraits<U, T>` class with the
appropriate mapping function, you can pass this context into
the mapping function.
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D24162
llvm-svn: 280977
And associated commits, as they broke the Thumb bots.
This reverts commit r280935.
This reverts commit r280891.
This reverts commit r280888.
llvm-svn: 280967
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.
This is an initial step towards determining the sign bits of a vector (PR29079).
Differential Revision: https://reviews.llvm.org/D24253
llvm-svn: 280927
CGP tail-duplicates rets into blocks that end with a call that feed the ret.
This puts the call in tail position, potentially allowing the DAG builder to
lower it as a tail call. To avoid tail duplication in cases where we won't
form the tail call, CGP tried to predict whether this is going to be possible,
and avoids doing it when lowering as a tail call will definitely fail.
However, it was being too conservative by always throwing away calls to
functions with a signext/zeroext attribute on the return type.
Instead, we can use the same logic the builder uses to determine whether the
attributes work out.
Differential Revision: https://reviews.llvm.org/D24315
llvm-svn: 280894
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:
1. https://reviews.llvm.org/D23932 (Clang test)
2. https://reviews.llvm.org/D23933 (compiler-rt)
Differential Revision: https://reviews.llvm.org/D23931
llvm-svn: 280888
The original commit was too aggressive about marking LibCalls as AAPCS. The
libcalls contain libc/libm/libunwind calls which are not AAPCS, but C.
llvm-svn: 280833
When branching to a block that immediately tail calls, it is possible to fold
the call directly into the branch if the call is direct and there is no stack
adjustment, saving one byte.
Example:
define void @f(i32 %x, i32 %y) {
entry:
%p = icmp eq i32 %x, %y
br i1 %p, label %bb1, label %bb2
bb1:
tail call void @foo()
ret void
bb2:
tail call void @bar()
ret void
}
before:
f:
movl 4(%esp), %eax
cmpl 8(%esp), %eax
jne .LBB0_2
jmp foo
.LBB0_2:
jmp bar
after:
f:
movl 4(%esp), %eax
cmpl 8(%esp), %eax
jne bar
.LBB0_1:
jmp foo
I don't expect any significant size savings from this (on a Clang bootstrap I
saw 288 bytes), but it does make the code a little tighter.
This patch only does 32-bit, but 64-bit would work similarly.
Differential Revision: https://reviews.llvm.org/D24108
llvm-svn: 280832
Summary:
Previously we were trying to represent this with the "contains" list of
the .cv_inline_linetable directive, which was not enough information.
Now we directly represent the chain of inlined call sites, so we know
what location to emit when we encounter a .cv_loc directive of an inner
inlined call site while emitting the line table of an outer function or
inlined call site. Fixes PR29146.
Also fixes PR29147, where we would crash when .cv_loc directives crossed
sections. Now we write down the section of the first .cv_loc directive,
and emit an error if any other .cv_loc directive for that function is in
a different section.
Also fixes issues with discontiguous inlined source locations, like in
this example:
volatile int unlikely_cond = 0;
extern void __declspec(noreturn) abort();
__forceinline void f() {
if (!unlikely_cond) abort();
}
int main() {
unlikely_cond = 0;
f();
unlikely_cond = 0;
}
Previously our tables gave bad location information for the 'abort'
call, and the debugger wouldn't snow the inlined stack frame for 'f'.
It is important to emit good line tables for this code pattern, because
it comes up whenever an asan bug occurs in an inlined function. The
__asan_report* stubs are generally placed after the normal function
epilogue, leading to discontiguous regions of inlined code.
Reviewers: majnemer, amccarth
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24014
llvm-svn: 280822
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).
Fixes PR30276.
For convenience, here's the commit message from r246507, which explains the
problem is greater detail:
[DAGCombine] Fixup SETCC legality checking
SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.
The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.
The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:
(setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
(which is possible for some combinations of (op1, op2))
If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.
To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).
Fixes PR24636.
llvm-svn: 280767
If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.
This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.
Differential Revision: https://reviews.llvm.org/D24254
llvm-svn: 280715
Summary:
In addition to not including the register operand of the current
instruction also don't include any aliasing registers. We can't consider
these as candidates because using them will clobber the corresponding
register operand of the current instruction.
This change doesn't include a test case and it would probably be difficult
to produce a stable one since the bug depends on the results of register
allocation.
Reviewers: MatzeB, qcolombet, hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D24130
llvm-svn: 280698
All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM
AAPCS VFP CC hosts. Tweak the default initialisation to ARM AAPCS CC rather
than C CC for ARM/thumb targets.
The changes to the tests are necessary to ensure that the calling convention for
the lowered library calls are honoured. Furthermore, these adjustments cause
certain branch invocations to change to branch-and-link since the returned value
needs to be moved across registers (d0 -> r0, r1).
llvm-svn: 280683
CGP currently drops select's MD_prof profile data when
generating conditional branch which can lead to bad
code layout. The patch fixes the issue.
Differential Revision: http://reviews.llvm.org/D24169
llvm-svn: 280600
Delete the dead code for Write(ilist_iterator) in the IR Verifier,
inline report(ilist_iterator) at its call sites in the MachineVerifier,
and use simple_ilist<>::iterator in SymbolTableListTraits.
The only remaining reference to ilist_iterator outside of the ilist
implementation is from MachineInstrBundleIterator. I'll get rid of that
in a follow-up.
llvm-svn: 280565
Subregister definitions are considered uses for the purpose of tracking
liveness of the whole register. At the same time, when calculating live
interval subranges, subregister defs should not be treated as uses.
Differential Revision: https://reviews.llvm.org/D24190
llvm-svn: 280532
One side of a diamond may end with a predicate clobbering instruction.
That side of the diamond has to be if-converted second. Both sides can't
clobber the predicate or the ifconversion is invalid. This is checked
elsewhere, but add an assert as a safety check. NFC
llvm-svn: 280518
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.
Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.
Differential Revision: https://reviews.llvm.org/D22840
llvm-svn: 280505
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2
That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).
The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()
Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.
This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.
Differential Revision: https://reviews.llvm.org/D24154
llvm-svn: 280482
If the entire blocks match, we would count the branch instructions
toward the number of duplicated instructions. This doesn't match what we
do elsewhere, and was causing a bug.
llvm-svn: 280448
When expanding a SETCC for which the low half is known to evaluate to false,
we can only throw it away for LT/GT comparisons, not LE/GE.
This fixes PR29170.
Differential Revision: https://reviews.llvm.org/D24151
llvm-svn: 280424
Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.
Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.
This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
(concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.
(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)
llvm-svn: 280418
They're another source of generic vregs, which are going to need a type on the
definition when we remove the register width from MachineRegisterInfo.
llvm-svn: 280412
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).
To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).
Differential Revision: https://reviews.llvm.org/D23893
llvm-svn: 280386
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:
ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)
where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.
This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.
Fixes PR26761.
Differential Revision: https://reviews.llvm.org/D24038
llvm-svn: 280350
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.
Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.
Differential Revision: https://reviews.llvm.org/D22825
llvm-svn: 280347
Summary:
This change promotes the 'isTailCall(...)' member function to
TargetInstrInfo as a query interface for determining on a per-target
basis whether a given MachineInstr is a tail call instruction. We build
upon this in the XRay instrumentation pass to emit special sleds for
tail call optimisations, where we emit the correct kind of sled.
The tail call sleds look like a mix between the function entry and
function exit sleds. Form-wise, the sled comes before the "jmp"
instruction that implements the tail call similar to how we do it for
the function entry sled. Functionally, because we know this is a tail
call, it behaves much like an exit sled -- i.e. at runtime we may use
the exit trampolines instead of a different kind of trampoline.
A follow-up change to recognise these sleds will be done in compiler-rt,
so that we can start intercepting these initially as exits, but also
have the option to have different log entries to more accurately reflect
that this is actually a tail call.
Reviewers: echristo, rSerge, majnemer
Subscribers: mehdi_amini, dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D23986
llvm-svn: 280334