# What
This PR renames the newly-introduced llvm attribute
`sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise,
sibling variables such as `SanitizeRealtimeUnsafe` are renamed to
`SanitizeRealtimeBlocking` respectively. There are no other functional
changes.
# Why?
- There are a number of problems that can cause a function to be
real-time "unsafe",
- we wish to communicate what problems rtsan detects and *why* they're
unsafe, and
- a generic "unsafe" attribute is, in our opinion, too broad a net -
which may lead to future implementations that need extra contextual
information passed through them in order to communicate meaningful
reasons to users.
- We want to avoid this situation and make the runtime library boundary
API/ABI as simple as possible, and
- we believe that restricting the scope of attributes to names like
`sanitize_realtime_blocking` is an effective means of doing so.
We also feel that the symmetry between `[[clang::blocking]]` and
`sanitize_realtime_blocking` is easier to follow as a developer.
# Concerns
- I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been
part of the tree for a few weeks now (introduced here:
https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't
been released in version 20 yet, am I correct in considering this to not
be a breaking change?
The two folding operations are causing a cycle for the following case
with
scalable vector types:
define <vscale x 2 x double> @test_fneg_select_abs(<vscale x 2 x i1>
%cond, <vscale x 2 x double> %b) {
%1 = select <vscale x 2 x i1> %cond, <vscale x 2 x double>
zeroinitializer, <vscale x 2 x double> %b
%2 = fneg fast <vscale x 2 x double> %1
ret <vscale x 2 x double> %2
}
1) fold fneg: -(Cond ? C : Y) -> Cond ? -C : -Y
2) fold select: (Cond ? -X : -Y) -> -(Cond ? X : Y)
1) results in the following since '<vscale x 2 x double>
zeroinitializer' passes
the check for the immediate constant:
%.neg = fneg fast <vscale x 2 x double> zeroinitializer
%b.neg = fneg fast <vscale x 2 x double> %b
%1 = select fast <vscale x 2 x i1> %cond, <vscale x 2 x double> %.neg,
<vscale x 2 x double> %b.neg
and so we end up going back and forth between 1) and 2).
Attempt to fold scalable vector constants, so that we end up with a
splat instead:
define <vscale x 2 x double> @test_fneg_select_abs(<vscale x 2 x i1>
%cond, <vscale x 2 x double> %b) {
%b.neg = fneg fast <vscale x 2 x double> %b
%1 = select fast <vscale x 2 x i1> %cond, <vscale x 2 x double>
shufflevector (<vscale x 2 x double> insertelement (<vscale x 2 x
double> poison, double -0.000000e+00, i64 0), <vscale x 2 x double>
poison, <vscale x 2 x i32> zeroinitializer), <vscale x 2 x double>
%b.neg
ret <vscale x 2 x double> %1
}
Instead of storing this as a separate array of non-integral pointers,
add it to the PointerSpec class instead. This will allow for future
simplifications such as splitting the non-integral property into
multiple distinct ones: relocatable (i.e. non-stable representation) and
non-integral representation (i.e. pointers with metadata).
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/105734
Type::isScalableTy and StructType::containsScalableVectorType failed to
detect some cases of structs containing scalable vectors because
containsScalableVectorType did not call back into isScalableTy to check
the element types. Fix this, which requires sharing the same Visited set
in both functions. Also change the external API so that callers are
never required to pass in a Visited set, and normalize the naming to
isScalableTy.
This is part of the effort to support for enabling plugins on windows by
adding better support for building llvm as a DLL. The export macros used
here were added in #96630
Since shared library symbols aren't deduplicated across multiple
libraries on windows like Linux we have to manually explicitly import
and export `Any::TypeId` template instantiations for the uses of
`llvm::Any` in the LLVM codebase to support LLVM Windows shared library
builds.
This change ensures that external code, including LLVM's own tests, can
use PassManager callbacks when LLVM is built as a DLL.
I also removed the only use of llvm::Any for LoopNest that only existed
in debug code and there also doesn't seem to be any code creating
`Any<LoopNest>`
With this change and appropriate linker changes
(https://r.android.com/3236256)
AOSP boots with memtag-global throughout the platform.
Without this change, we would sometimes generate PC-relative references
to tagged globals, which then do not have the proper tag.
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.
The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.
Fixes#112633
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.
The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.
Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
Based on example PR #96222 and fix PR #101268, with some differences due
to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp).
- Add llvm.experimental.constrained.atan2 - Intrinsics.td,
ConstrainedOps.def, LangRef.rst
- Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in
BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp
- Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp,
and LegalizeVectorTypes.cpp
- Update isKnownNeverNaN in SelectionDAG.cpp
- Update SelectionDAGDumper.cpp
- Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp
- TargetLoweringBase.cpp - Expand for vectors, promote f16
- X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC
Part 4 for Implement the atan2 HLSL Function #70096.
- **[Inliner] Add tests for propagating more parameter attributes; NFC**
- **[Inliner] Propagate more attributes to params when inlining**
Add support for propagating:
- `derefereancable`
- `derefereancable_or_null`
- `align`
- `nonnull`
- `range`
These are only propagated if the parameter to the to-be-inlined callsite
match the exact parameter used in the to-be-inlined function.
MSVC has a set of qualifiers to allow using 32-bit signed/unsigned
pointers when building 64-bit targets. This is useful for WoW code
(i.e., the part of Windows that handles running 32-bit application on a
64-bit OS). Currently this is supported on x64 using the 270, 271 and
272 address spaces, but does not work for AArch64 at all.
This change adds the same 270, 271 and 272 address spaces to AArch64 and
adjusts the data layout string accordingly. Clang will generate the
correct address space casts, but these will currently be ignored until
the AArch64 backend is updated to handle them.
Partially fixes#62536
This is a resurrected version of <https://reviews.llvm.org/D158857>
(originally created by @a_vorobev) - I've cleaned it up a little, fixed
the rest of the tests and added to auto-upgrade for the data layout.
This patch enables support for cloning in indirect callsites.
This is done by synthesizing callsite records for each virtual call
target from the profile metadata. In the thin link all the synthesized
records for a particular indirect callsite initially share the same
context node, but support is added to partition the callsites and
outgoing edges based on the callee function, creating a separate node
for each target.
In the LTO backend, when cloning is needed we first perform indirect
call promotion, then change the target of the new direct call to the
desired clone.
Note this is ThinLTO-specific, since for regular LTO indirect call
promotion should have already occurred.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
Note: The current implementation doesn't return optimal result for `fcmp
one/une x, +/-inf` since we don't handle this case in
https://github.com/llvm/llvm-project/pull/110891. Maybe we can make it
optimal after seeing some real-world cases.
This patch adds support for `ConstantFPRange::makeSatisfyingFCmpRegion`.
We only check the optimality for cases where the result can be
represented by a ConstantFPRange.
This patch also adds some tests for `ConstantFPRange::fcmp` because it
depends on `makeSatisfyingFCmpRegion`. Unfortunately we cannot
exhaustively test this function due to time limit. I just pick some
interesting ranges instead.
This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.
In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For
AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We
therefore
need to codegen this by inserting a runtime address space check,
performing
the private case without atomics, and fallback to issuing the real
atomic
otherwise. This metadata allows us to avoid this extra check and branch.
Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.
Adds hidden kernel arguments to the function signature and marks them
inreg if they should be preloaded into user SGPRs. The normal kernarg
preloading logic then takes over with some additional checks for the
correct implicitarg_ptr alignment.
Special care is needed so that metadata for the hidden arguments is not
added twice when generating the code object.
Affected intrinsics:
llvm.aarch64.sve.fcvt.bf16f32
llvm.aarch64.sve.fcvtnt.bf16f32
The named intrinsics took a predicate based on the smallest element type
when it should be based on the largest. The intrinsics have been replace
by v2 equivalents and affected code ported to use them.
Patch includes changes to getSVEPredicateBitCast() that ensure the
generated code for the auto-upgraded old intrinsics is unchanged.
It turns out that we cannot rely on the presence of `-i64:64` as a
position reference when adding the `-i128:128` datalayout string due to
some custom datalayout strings lacking it (e.g ones used by bugpoint,
among other things).
Do not add the `-i128:128` string in that case.
This fixes the regression introduced in
https://github.com/llvm/llvm-project/pull/106951.
Prior impl would fail if the number of attribute sets on the two calls
wasn't the same which is unnecessary as long as we aren't throwing
away and must-preserve attrs.
Closes#110896
This extends FPMathOperator to allow calls that return literal structs
of homogeneous floating-point or vector-of-floating-point types.
The intended use case for this is to support FP intrinsics that return
multiple values (such as `llvm.sincos`).
Some (many) attributes can safely be dropped to enable sinking. For
example removing `nonnull` on a return/param can't affect correctness.
Closes#109472
Add support for taking the intersection of two AttributeLists s.t the
result list contains attributes that are valid in the context of both
inputs.
i.e if we have `nonnull align(32) noundef` intersected with `nonnull
align(16) dereferenceable(10)`, the result is `nonnull align(16)`.
Further it handles attributes that are not-droppable. For example
dropping `byval` can change the nature of a callsite/function so its
impossible to correct a correct intersection if its dropped from the
result. i.e `nonnull byval(i64)` intersected with `nonnull` is
invalid.
The motivation for the infrastructure is to enable sinking/hoisting
callsites with differing attributes.
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.
Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
Remove the following intrinsics which can be trivially replaced with an
`addrspacecast`
* llvm.nvvm.ptr.gen.to.global
* llvm.nvvm.ptr.gen.to.shared
* llvm.nvvm.ptr.gen.to.constant
* llvm.nvvm.ptr.gen.to.local
* llvm.nvvm.ptr.global.to.gen
* llvm.nvvm.ptr.shared.to.gen
* llvm.nvvm.ptr.constant.to.gen
* llvm.nvvm.ptr.local.to.gen
Also, cleanup the NVPTX lowering of `addrspacecast` making it more
concise.
This was reverted to avoid conflicts while reverting #107655. Re-landing
unchanged.
This change deprecates the following intrinsics which can be trivially
converted to llvm funnel-shift intrinsics:
- @llvm.nvvm.rotate.b32
- @llvm.nvvm.rotate.right.b64
- @llvm.nvvm.rotate.b64
This fixes a bug in the previous version (#107655) which flipped the
order of the operands to the PTX funnel shift instruction. In LLVM IR
the high bits are the first arg and the low bits are the second arg,
while in PTX this is reversed.
When searching for an intrinsic name in a target specific slice of the
intrinsic name table, skip over the target prefix. For such cases,
currently the first loop iteration in `lookupLLVMIntrinsicByName` does
nothing (i.e., `Low` and `High` stay unchanged and it does not shrink
down the search window), so we can skip this useless first iteration by
skipping over the target prefix.