Many uses of getIntPtrType() were using that type to calculate the
neened type for GEP offset arguments. However, some time ago,
DataLayout was extended to support pointers where the size of the
pointer is not equal to the size of the values used to index it.
Much code was already migrated to, for example, use getIndexSizeInBits
instead of getPtrSizeInBits, but some rewrites still used
getIntPtrType() to get the type for GEP offsets.
This commit changes uses of getIntPtrType() to getIndexType() where
they are involved in a GEP-related calculation.
In at least one case (bounds check insertion) this resolves a compiler
crash that the new test added here would previously trigger.
This commit does not impact
- C library-related rewriting (memcpy()), which are operating under
the assumption that intptr_t == size_t. While all the mechanisms for
breaking this assumption now exist, doing so is outside the scope of
this commit.
- Code generation and below. Note that the use of getIntPtrType() in
CodeGenPrepare will be changed in a future commit.
- Usage of getIntPtrType() in any backend
Depends on D143435
Reviewed By: arichardson
Differential Revision: https://reviews.llvm.org/D143437
Use the memory() spelling in a few places that were using the
old syntax.
The documented attributes for llvm.type.checked.load don't match
the actual attributes, I've raised this here:
https://reviews.llvm.org/D21121#inline-1406792
This carries a bitmask indicating forbidden floating-point value kinds
in the argument or return value. This will enable interprocedural
-ffinite-math-only optimizations. This is primarily to cover the
no-nans and no-infinities cases, but also covers the other floating
point classes for free. Textually, this provides a number of names
corresponding to bits in FPClassTest, e.g.
call nofpclass(nan inf) @must_be_finite()
call nofpclass(snan) @cannot_be_snan()
This is more expressive than the existing nnan and ninf fast math
flags. As an added bonus, you can represent fun things like nanf:
declare nofpclass(inf zero sub norm) float @only_nans()
Compared to nnan/ninf:
- Can be applied to individual call operands as well as the return value
- Can distinguish signaling and quiet nans
- Distinguishes the sign of infinities
- Can be safely propagated since it doesn't imply anything about
other operands.
- Does not apply to FP instructions; it's not a flag
This is one step closer to being able to retire "no-nans-fp-math" and
"no-infs-fp-math". The one remaining situation where we have no way to
represent no-nans/infs is for loads (if we wanted to solve this we
could introduce !nofpclass metadata, following along with
noundef/!noundef).
This is to help simplify the GPU builtin math library
distribution. Currently the library code has explicit finite math only
checks, read from global constants the compiler driver needs to set
based on the compiler flags during linking. We end up having to
internalize the library into each translation unit in case different
linked modules have different math flags. By propagating known-not-nan
and known-not-infinity information, we can automatically prune the
edge case handling in most functions if the function is only reached
from fast math uses.
- The current implementation checks them for 24-bit inegers but the
document says 23-bit one effectively by listing the range as [1,2^23).
- Minor error message correction.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144685
The function `CGDebugInfo::EmitFunctionDecl` is supposed to create a
declaration -- never a _definition_ -- of a subprogram. This is made
evident by the fact that the SPFlags never have the "Declaration" bit
set by that function.
However, when `EmitFunctionDecl` calls `DIBuilder::createFunction`, it
still tries to fill the "Declaration" argument by passing it the result
of `getFunctionDeclaration(D)`. This will query an internal cache of
previously created declarations and, for most code paths, we return
nullptr; all is good.
However, as reported in [0], there are pathological cases in which we
attempt to recreate a declaration, so the cache query succeeds,
resulting in a subprogram declaration whose declaration field points to
another declaration. Through a series of RAUWs, the declaration field
ends up pointing to the SP itself. Self-referential MDNodes can't be
`unique`, which causes the verifier to fail (declarations must be
`unique`).
We can argue that the caller should check the cache first, but this is
not a correctness issue (declarations are `unique` anyway). The bug is
that `CGDebugInfo::EmitFunctionDecl` should always pass `nullptr` to the
declaration argument of `DIBuilder::createFunction`, expressing the fact
that declarations don't point to other declarations. AFAICT this is not
something for which any reasonable meaning exists.
This seems a lot like a copy-paste mistake that has survived for ~10
years, since other places in this file have the exact same call almost
token-by-token.
I've tested this by compiling LLVMSupport with and without the patch, O2
and O0, and comparing the dwarfdump of the lib. The dumps are identical
modulo the attributes decl_file/producer/comp_dir.
[0]: https://github.com/llvm/llvm-project/issues/59241
Differential Revision: https://reviews.llvm.org/D143921
This patch adds 2 new intrinsics:
; Interleave two vectors into a wider vector
<vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd)
; Deinterleave the odd and even lanes from a wider vector
{<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec)
The main motivator for adding these intrinsics is to support vectorization of
complex types using scalable vectors.
The intrinsics are kept simple by only supporting a stride of 2, which makes
them easy to lower and type-legalize. A stride of 2 is sufficient to handle
complex types which only have a real/imaginary component.
The format of the intrinsics matches how `shufflevector` is used in
LoopVectorize. For example:
using cf = std::complex<float>;
void foo(cf * dst, int N) {
for (int i=0; i<N; ++i)
dst[i] += cf(1.f, 2.f);
}
For this loop, LoopVectorize:
(1) Loads a wide vector (e.g. <8 x float>)
(2) Extracts odd lanes using shufflevector (leading to <4 x float>)
(3) Extracts even lanes using shufflevector (leading to <4 x float>)
(4) Performs the addition
(5) Interleaves the two <4 x float> vectors into a single <8 x float> using
shufflevector
(6) Stores the wide vector.
In this example, we can 1-1 replace shufflevector in (2) and (3) with the
deinterleave intrinsic, and replace the shufflevector in (5) with the
interleave intrinsic.
The SelectionDAG nodes might be extended to support higher strides (3, 4, etc)
as well in the future.
Similar to what was done for vector.splice and vector.reverse, the intrinsic
is lowered to a shufflevector when the type is fixed width, so to benefit from
existing code that was written to recognize/optimize shufflevector patterns.
Note that this approach does not prevent us from adding new intrinsics for other
strides, or adding a more generic shuffle intrinsic in the future. It just solves
the immediate problem of being able to vectorize loops with complex math.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D141924
Make it explicit that SNaN is not handled differently than
QNaN in the LLVM default floating-point environment.
Note that an IEEE-754-compliant model disallows transforms
like "X * 1.0 -> X". That is because math operations are
expected to convert SNaN to QNaN (set the signaling bit).
But LLVM has had those kinds of transforms from the beginning:
https://alive2.llvm.org/ce/z/igb55y
We should be IEEE-754-compliant under strict-FP (the logic is
implemented with a helper named canIgnoreSNaN()), but I don't
think there is any demand to do that with default optimization.
See issue #43070 for earlier draft/discussion about this change.
Differential Revision: https://reviews.llvm.org/D143074
The original change mistakenly excluded parameter registers from the
list of callee-saved-registers. This reland fixes it - it only excludes
the return registers for preserve_all/preserve_most CCs.
Original description:
> Currently both calling conventions preserve registers that are used to
> store a return value. This causes the returned value to be lost:
>
> define i32 @bar() {
> %1 = call preserve_mostcc i32 @foo()
> ret i32 %1
> }
>
> define preserve_mostcc i32 @foo() {
> ret i32 2
> ; preserve_mostcc will restore %rax,
> ; whatever it was before the call.
> }
>
> This contradicts the current documentation (preserve_allcc "behaves
> identical to the `C` calling conventions on how arguments and return
> values are passed") and also breaks [[clang::preserve_most]].
>
> This change makes CSRs be preserved iff they are not used to store a
> return value (e.g. %rax for scalars, {%rax:%rdx} for __int128, %xmm0
> for double). For void functions no additional registers are
> preserved, i.e. the behaviour is backward compatible with existing
> code.
Differential Revision: https://reviews.llvm.org/D143425
This does not read canonicalized values, which matches the behavior of
the basic DAG expansion using integer operations. There is a buggy
expansion using FP-operations if legal which needs to be adjusted to
account for this. We need to be aware of the denormal mode to switch
between is.fpclass calls and fcmp.
There's no real spec for denormal handling anywhere, but I believe
this is the most harmonious way to deal with the question considering
the requirement to not quiet input signaling nans.
This matches the behavior of MSVC's _fpclass and AMDGPU's
v_cmp_class_f32. fpclassify currently does not use this, and has
inconsistent behavior for denormals under DAZ on different platforms
(i.e. clang and gcc report FP_ZERO return FP_ZERO for a denormal under
DAZ, MSVC reports FP_SUBNORMAL).
This caused Chromium to crash, see comment on the code review.
> Currently both calling conventions preserve registers that are used to
> store a return value. This causes the returned value to be lost:
>
> define i32 @bar() {
> %1 = call preserve_mostcc i32 @foo()
> ret i32 %1
> }
>
> define preserve_mostcc i32 @foo() {
> ret i32 2
> ; preserve_mostcc will restore %rax,
> ; whatever it was before the call.
> }
>
> This contradicts the current documentation (preserve_allcc "behaves
> identical to the `C` calling conventions on how arguments and return
> values are passed") and also breaks [[clang::preserve_most]].
>
> This change makes CSRs be preserved iff they are not used to store a
> return value (e.g. %rax for scalars, {%rax:%rdx} for __int128, %xmm0
> for double). For void functions no additional registers are
> preserved, i.e. the behaviour is backward compatible with existing
> code.
>
> Differential Revision: https://reviews.llvm.org/D141020
This reverts commit 0276fa89d7a4dbe73105c9148f947716b3d8f17f.
These are essentially add/sub 1 with a clamping value.
AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
Currently both calling conventions preserve registers that are used to
store a return value. This causes the returned value to be lost:
define i32 @bar() {
%1 = call preserve_mostcc i32 @foo()
ret i32 %1
}
define preserve_mostcc i32 @foo() {
ret i32 2
; preserve_mostcc will restore %rax,
; whatever it was before the call.
}
This contradicts the current documentation (preserve_allcc "behaves
identical to the `C` calling conventions on how arguments and return
values are passed") and also breaks [[clang::preserve_most]].
This change makes CSRs be preserved iff they are not used to store a
return value (e.g. %rax for scalars, {%rax:%rdx} for __int128, %xmm0
for double). For void functions no additional registers are
preserved, i.e. the behaviour is backward compatible with existing
code.
Differential Revision: https://reviews.llvm.org/D141020
It is widely assumed that i8 is naturally aligned (i8:8),
and that hence i8s can be used to access arbitrary bytes.
As discussed in https://discourse.llvm.org/t/status-of-overaligned-i8,
this patch makes this assumption explicit, by documenting it in
the LangRef, and enforcing it when parsing a data layout string.
Historically, there have been data layouts that violate this requirement,
notably the old DXIL data layout that aligns i8 to 32 bits.
A previous patch (df1a74a) enabled importing modules with invalid data layouts
using override callbacks.
Users who wish to continue importing modules with overaligned i8s (e.g. DXIL)
thus need to provide a data layout override callback that fixes the
data layout, at minimum by setting natural alignment for i8.
Any further adjustments to the module (e.g. adding padding bytes if necessary)
need to be done after module import. In the case of DXIL, this should not be
necessary, because i8 usage in DXIL is very limited and its alignment actually
does not matter, see
https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#primitive-types
Differential Revision: https://reviews.llvm.org/D142211
Make violation of !range, !nonnull and !align metadata return poison
instead of causing immediate undefined behavior. This makes the
behavior match that of the nonnull and align parameter and return
value attributes. The previous behavior can be restored by additionally
specifying the !noundef metadata, same as with parameters.
Some benefits of this change are:
* This is needed to fix https://github.com/llvm/llvm-project/issues/59888.
Under current semantics, it is illegal to add !range annotations
based on known bits. Unless we want to drop that optimization
entirely, we need to change the !range semantics.
* This allows preserving range/nonnull/align metadata on
speculated loads. !noundef metadata needs to be dropped, but
the poison-generating metadata can be retained.
I don't think there are really disadvantages to the change (apart
from the need to review and adjust optimizations for the new
semantics), as the old behavior is still available via !noundef,
so it should be strictly more flexible.
Differential Revision: https://reviews.llvm.org/D141386
And link to the AssignmentTracking.md document which goes into more detail.
Reviewed By: jryans
Differential Revision: https://reviews.llvm.org/D141131
The patch also adds expandVPCTLZ and expandVPCTTZ to expand vp.ctlz/cttz nodes
and the cost model of vp.ctlz/cttz.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D140370
Add `nooutline` + update LangRef to say it exists.
This makes it possible to say "don't outline from this function ever."
We want to be able to toggle whether or not a function should be in the search
set regardless of default behaviour.
Add testcases for the IR Outliner + Machine Outliner.
Also remove an unnecessary check for an empty function in the Machine Outliner.
Differential Revision: https://reviews.llvm.org/D140438
This is an alternative way of D139627 suggested by Craig. Creently only X86 backend uses this attribute. Let's just emit for X86 only.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139701
Target-extension types represent types that need to be preserved through
optimization, but otherwise are not introspectable by target-independent
optimizations. This patch doesn't add any uses of these types by an existing
backend, it only provides basic infrastructure such that these types would work
correctly.
Reviewed By: nikic, barannikov88
Differential Revision: https://reviews.llvm.org/D135202
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D139507
This operand bundle on an assume informs alias analysis that the
arguments point to regions of memory that were allocated separately
(i.e. different heap allocations, different allocas, or different
globals).
As a safety measure, we leave the analysis flag-disabled by default.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D136514
The patch also adds expandVPCTPOP in TargetLowering to expand VP_CTPOP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139920
The patch also added function expandVPBITREVERSE to expand ISD::VP_BITREVERSE nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139697
This allows the LLParser to also accept "A", "G", and "P" in `addrspace`
usages. "A" will be replaced by the alloca address space defined in the
globals, "G" by the default globals address space and "P" by the program
address space. This makes it easier to write tests that use different
address space and only only vary the RUN: lines. Currently, the only
alternative is to pre-process the sources with a tool such as `sed`
Importantly, these new string values are only accepted in .ll files and
not stored in the bitcode format, so it does not round-trip via llvm-as
and llvm-dis (see newly added test).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D138789
This reverts commit 7883e5b061bdbbe8bee5f479ebe911db5045b7e9.
The original commit was reverted that it didn't update test files after D136263
landed. The recommit fixed those.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139509
The patch made VectorLegalizer expand ISD::VP_FSHL and ISD::VP_FSHR to
achieve the codegen.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D138379
Since D129288, we no longer use BlockAddress constants as operands of
callbr.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D138080
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137685