Make the corresponding change that was made for byval in
b7141207a483d39b99c2b4da4eb3bb591eca9e1a. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
Rather than calling hasFnAttribute and then calling getFnAttribute
if the attribute exists, its better to just call getFnAttribute and
then check if we got a valid attribute back.
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.
So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.
This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.
Differential Revision: https://reviews.llvm.org/D86744
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.
This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.
My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.
This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.
Background:
There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.
It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.
The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.
This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.
Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.
The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.
Additional context is in D79744.
As requested by Andrew Kaylor, rewrite this code in a way that does
not warn on old MSVC versions.
Avoid the buggy constexpr warning by just not using constexpr and
removing the static_assert that depends on it.
The `noundef` attribute indicates an argument or return value which
may never have an undef value representation.
This patch allows LLVM to parse the attribute.
Differential Revision: https://reviews.llvm.org/D83412
I noticed that for some benchmarks we spend quite a bit of time
inside AttributeList::hasAttrSomewhere(), mainly when checking
for the "returned" attribute. Most of the time the attribute will
not be present, in which case this function has to walk through
the whole attribute list and check for the attribute at each index.
This patch adds a cache of all "available somewhere" attributes
inside AttributeListImpl. This makes the structure 12 bytes larger,
but I don't think that's problematic, as attribute lists are uniqued.
Compile-time in terms of instructions retired improves by 0.4% on
average, but >1% for sqlite.
Differential Revision: https://reviews.llvm.org/D81867
While LLVM does fold this to x+1, GCC does not. As this is hot
code, let's try to avoid that.
According to
https://developercommunity.visualstudio.com/content/problem/211134/unsigned-integer-overflows-in-constexpr-functionsa.html
this spurious warning in MSVC has been fixed in Visual Studio 2019
Version 16.4. Let's see if there are any build bots running old
MSVC versions with warnings treated as errors...
- This allow us to specify the (minimal) alignment on an intrinsic's
arguments and, more importantly, the return value.
Differential Revision: https://reviews.llvm.org/D80422
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
The "null-pointer-is-valid" attribute needs to be checked by many
pointer-related combines. To make the check more efficient, convert
it from a string into an enum attribute.
In the future, this attribute may be replaced with data layout
properties.
Differential Revision: https://reviews.llvm.org/D78862
We want to add a way to avoid merging identical calls so as to keep the
separate debug-information for those calls. There is also an asan
usecase where having this attribute would be beneficial to avoid
alternative work-arounds.
Here is the link to the feature request:
https://bugs.llvm.org/show_bug.cgi?id=42783.
`nomerge` is different from `noline`. `noinline` prevents function from
inlining at callsites, but `nomerge` prevents multiple identical calls
from being merged into one.
This patch adds `nomerge` to disable the optimization in IR level. A
followup patch will be needed to let backend understands `nomerge` and
avoid tail merge at backend.
Reviewed By: asbirlea, rnk
Differential Revision: https://reviews.llvm.org/D78659
Add llvm.call.preallocated.{setup,arg} instrinsics.
Add "preallocated" operand bundle which takes a token produced by llvm.call.preallocated.setup.
Add "preallocated" parameter attribute, which is like byval but without the copy.
Verifier changes for these IR constructs.
See https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74651
This means AttrBuilder will always create a sorted set of attributes and
we can skip the sorting step. Sorting attributes is surprisingly
expensive, and I recently made it worse by making it use array_pod_sort.
Attributes are currently stored as a simple list. Enum attributes
additionally use a bitset to allow quickly determining whether an
attribute is set. String attributes on the other hand require a
full scan of the list. As functions tend to have a lot of string
attributes (at least when clang is used), this is a noticeable
performance issue.
This patch adds an additional name => attribute map to the
AttributeSetNode, which allows querying string attributes quickly.
This results in a 3% reduction in instructions retired on CTMark.
Changes to memory usage seem to be in the noise (attribute sets are
uniqued, and we don't tend to have more than a few dozen or hundred
unique attribute sets, so adding an extra map does not have a
noticeable cost.)
Differential Revision: https://reviews.llvm.org/D78859
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.
Differential revision: https://reviews.llvm.org/D78016
Summary: Add verification that operand bundles on an llvm.assume are well formed to the verify pass.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75269
Lots of headers pass around MemoryBuffer objects, but very few open
them. Let those that do include FileSystem.h.
Saves ~250 includes of Chrono.h & FileSystem.h:
$ diff -u thedeps-before.txt thedeps-after.txt | grep '^[-+] ' | sort | uniq -c | sort -nr
254 - ../llvm/include/llvm/Support/FileSystem.h
253 - ../llvm/include/llvm/Support/Chrono.h
237 - ../llvm/include/llvm/Support/NativeFormatting.h
237 - ../llvm/include/llvm/Support/FormatProviders.h
192 - ../llvm/include/llvm/ADT/StringSwitch.h
190 - ../llvm/include/llvm/Support/FormatVariadicDetails.h
...
This requires duplicating the file_t typedef, which is unfortunate. I
sunk the choice of mapping mode down into the cpp file using variable
template specializations instead of class members in headers.
Fix attempt
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Summary:
this patch makes tablegen generate llvm attributes in a more generic and simpler (at least to me).
changes: make tablegen generate
...
ATTRIBUTE_ENUM(Alignment,align)
ATTRIBUTE_ENUM(AllocSize,allocsize)
...
which can be used to generate most of what was previously used and more.
Tablegen was also generating attributes from 2 identical files leading to identical output. so I removed one of them and made user use the other.
Reviewers: jdoerfert, thakis, aaron.ballman
Reviewed By: aaron.ballman
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72455
Summary:
this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72475
Summary:
this patch makes tablegen generate llvm attributes in a more generic and simpler (at least to me).
changes: make tablegen generate
...
ATTRIBUTE_ENUM(Alignment,align)
ATTRIBUTE_ENUM(AllocSize,allocsize)
...
which can be used to generate most of what was previously used and more.
Tablegen was also generating attributes from 2 identical files leading to identical output. so I removed one of them and made user use the other.
Reviewers: jdoerfert, thakis, aaron.ballman
Reviewed By: aaron.ballman
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72455
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
Summary:
I initially encountered those assertions when trying to create
this IR `alignment` attribute from clang's `__attribute__((assume_aligned(imm)))`,
because until D72994 there is no sanity checking for the value of `imm`.
But even then, we have `llvm::Value::MaximumAlignment` constant (which is `536870912`),
which is enforced for clang attributes, and then there are some other magical constant
(`0x40000000` i.e. `1073741824` i.e. `2 * 536870912`) in
`Attribute::getWithAlignment()`/`AttrBuilder::addAlignmentAttr()`.
I strongly suspect that `0x40000000` is incorrect,
and that also should be `llvm::Value::MaximumAlignment`.
Reviewers: erichkeane, hfinkel, jdoerfert, gchatelet, courbet
Reviewed By: erichkeane
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D72998
This avoids unneeded copies when using a range-based for loops.
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.
Differential Revision: https://reviews.llvm.org/D70870