136 Commits

Author SHA1 Message Date
Matt Arsenault
778cf5431c IR: Add atomicrmw uinc_wrap and udec_wrap
These are essentially add/sub 1 with a clamping value.

AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
2023-01-24 17:55:11 -04:00
Nadeem, Usman
c9821abfc0 [WoA] Use fences for sequentially consistent stores/writes
LLVM currently uses LDAR/STLR and variants for acquire/release
as well as seq_cst operations. This is fine as long as all code uses
this convention.

Normally LDAR/STLR act as one way barriers but when used in
combination they provide a sequentially consistent model. i.e.
when an LDAR appears after an STLR in program order the STLR
acts as a two way fence and the store will be observed before
the load.

The problem is that normal loads (unlike ldar), when they appear
after the STLR can be observed before STLR (if my understanding
is correct). Possibly providing weaker than expected guarantees if
they are used for ordered atomic operations.

Unfortunately in Microsoft Visual Studio STL seq_cst ld/st are
implemented using normal load/stores and explicit fences:
dmb ish + str + dmb ish
ldr + dmb ish

This patch uses fences for MSVC target whenever we write to the
memory in a sequentially consistent way so that we don't rely on
the assumptions that just using LDAR/STLR will give us sequentially
consistent ordering.

Differential Revision: https://reviews.llvm.org/D141748

Change-Id: I48f3500ff8ec89677c9f089ce58181bd139bc68a
2023-01-23 16:09:11 -08:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Manuel Brito
f408635b26 [CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC]
Differential Revision: https://reviews.llvm.org/D138483
2022-11-24 08:42:28 +00:00
Nuno Lopes
b50e1bd605 Revert "[CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC]"
This reverts commit f50423c1a4422900aa1240fed643f5920451a88d.
2022-11-22 12:41:22 +00:00
Manuel Brito
f50423c1a4 [CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC]
Differential Revision: https://reviews.llvm.org/D138483
2022-11-22 11:40:25 +00:00
Phoebe Wang
d558255650 [X86] Use lock add/sub for cases that we only care about the EFLAGS
This fixes #36373, #36905 and partial of #58685.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137711
2022-11-18 21:43:47 +08:00
Matt Arsenault
3cfa03856f AtomicExpand: Support cmpxchg expansion for small FP types
Handles f16 atomics for AMDGPU.
2022-11-10 22:16:11 -08:00
Shilei Tian
1186e9d59f [LLVM][AMDGPU] Specialize 32-bit atomic fadd instruction for generic address space
The 32-bit floating-point atomic add instructions on AMDGPUs does not support a
"flat" or "generic" address space. So, if the address space cannot be determined
statically, the AMDGPU backend will fall back to a CAS loop (which does support
"flat" addressing). Instead, this patch emits runtime address-space checks to
allow native FP atomic add instructions for global and LDS memory (and non-atomic
FP add instructions for private/scratch memory).

In order to do that, this patch introduces a new interface function
`emitExpandAtomicRMW`. It is expected to be called when a common atomic expand
doesn't work for a specific target, such as the case we discussed here.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129690
2022-11-04 14:11:05 -04:00
Matt Arsenault
b60a9ccd02 AtomicExpand: Use InstSimplifyFolder
Automatically cleanup operations if we know the atomic has higher
alignment.
2022-10-31 23:31:42 -07:00
Matt Arsenault
07f12170a2 AtomicExpand: Don't create unused instructions for some atomicrmw
This wasn't used by every atomicrmw expansion.
2022-10-31 18:34:36 -07:00
Matt Arsenault
d0750ec475 AtomicExpand: Avoid some operations if the atomic is overaligned
Let some of the pointer bithacking fold away if we know the LSB are 0.
2022-10-13 23:31:00 -07:00
Matt Arsenault
a61c3455c0 AtomicExpand: Use llvm.ptrmask instead of ptrtoint
This removes the ptrtoint from the load's pointer operand, although we
can't entirely eliminate these to get the LSB shift. In a future
patch, this will avoid ptrtoint in the case where the atomic is
overaligned to the word size.
2022-09-28 12:51:30 -04:00
Matt Arsenault
b9a371f6d1 AtomicExpand: Use correct pointer size for integer
This was using the default address space.
2022-09-20 16:51:05 -04:00
Marco Elver
f0d6709e4a [AtomicExpandPass] Always copy pcsections Metadata to expanded atomics
When expanding IR atomics to target-specific atomics, copy all
!pcsections Metadata to expanded atomics automatically.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130885
2022-09-07 11:36:01 +02:00
Kai Luo
ad2f7fd286 [AtomicExpand] Make floating point conversion happens before fence insertion
IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations.
This also fixes atomic load of floating point values which requires fence on PowerPC.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D127609
2022-08-31 09:54:58 +08:00
Alexander Shaposhnikov
1e636f2676 [IRBuilder] Add assert for AtomicRMW ordering
Add assert for AtomicRMW: Ordering != AtomicOrdering::Unordered
(https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Verifier.cpp#L3944)
and adjust expandAtomicStore accordingly.

Test plan:
1/ ninja check-llvm check-clang check-lld
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D130457
2022-07-25 22:51:25 +00:00
Kazu Hirata
9e6d1f4b5d [CodeGen] Qualify auto variables in for loops (NFC) 2022-07-17 01:33:28 -07:00
Shilei Tian
1023ddaf77 [LLVM] Add the support for fmax and fmin in atomicrmw instruction
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127041
2022-07-06 10:57:53 -04:00
Takafumi Arakaki
18e6b8234a Allow pointer types for atomicrmw xchg
This adds support for pointer types for `atomic xchg` and let us write
instructions such as `atomicrmw xchg i64** %0, i64* %1 seq_cst`. This
is similar to the patch for allowing atomicrmw xchg on floating point
types: https://reviews.llvm.org/D52416.

Differential Revision: https://reviews.llvm.org/D124728
2022-05-25 16:20:26 +00:00
Shilei Tian
ff60a0a364 [LLVM] Add a check if should cast atomic operations to integer type
Currently for atomic load, store, and rmw instructions, as long as the
operand is floating-point value, they are casted to integer. Nowadays many
targets can actually support part of atomic operations with floating-point
operands. For example, NVPTX supports atomic load and store of floating-point
values. This patch adds a series interface functions `shouldCastAtomicXXXInIR`,
and the default implementations are same as what we currently do. Later for
targets can have their specialization.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D125652
2022-05-20 17:23:53 -04:00
Matt Arsenault
9fdd25848a Transforms: Fix code duplication between LowerAtomic and AtomicExpand 2022-04-08 19:06:36 -04:00
Matt Arsenault
7f14a1d46b AtomicExpand: Add NotAtomic lowering strategy
Currently LowerAtomics exists as a separate pass which blindly
replaces all atomics. Add a new lowering strategy option to eliminate
the atomics which the target can control on a per-instruction level.
2022-04-06 22:34:35 -04:00
Matt Arsenault
c4ea925f50 AtomicExpand: Change return type for shouldExpandAtomicStoreInIR
Use the same enum as the other atomic instructions for consistency, in
preparation for addition of another strategy.

Introduce a new "Expand" option, since the store expansion does not
use cmpxchg. Alternatively, the existing CmpXChg strategy could be
renamed to Expand.
2022-04-06 22:34:04 -04:00
Eli Friedman
5cd9fa551e Fix computation of MadeChange bit in AtomicExpandPass.
Fixes llvm-clang-x86_64-expensive-checks-debian failure with 2f497ec3.

expandAtomicStore always modifies the function, so make sure we set
MadeChange unconditionally. Not sure how nobody else has stumbled over
this before.
2022-03-18 13:47:11 -07:00
Kai Luo
31906a6090 [AtomicExpand][PowerPC] Fix all-one mask value
When generating a all-one mask value whose bitwidth is larger than 64, signed extension should be used rather then zero extension.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D120865
2022-03-18 13:35:54 +08:00
Marco Elver
b09439e20b [AtomicExpandPass][NFC] Reformat with clang-format
NFCI.
2022-03-17 16:58:16 +01:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Phoebe Wang
e03d216c28 [X86] Use bit test instructions to optimize some logic atomic operations
This is to match GCC's optimizations: https://gcc.godbolt.org/z/3odh9e7WE

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120199
2022-03-01 09:57:08 +08:00
Kazu Hirata
feb40a3a47 [llvm] Use range-based for loops with instructions (NFC) 2021-11-14 19:40:48 -08:00
Anshil Gandhi
508b06699a [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions
Produce remarks when atomic instructions are expanded into hardware instructions
in SIISelLowering.cpp. Currently, these remarks are only emitted for atomic fadd
instructions.

Differential Revision: https://reviews.llvm.org/D108150
2021-08-19 20:51:19 -06:00
Arthur Eubanks
de0ae9e89e [NFC] Cleanup more AttributeList::addAttribute() 2021-08-17 21:05:41 -07:00
Anshil Gandhi
f22ba51873 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating a
compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-16 14:56:01 -06:00
Dávid Bolvanský
49de6070a2 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit 435785214f73ff0c92e97f2ade6356e3ba3bf661. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.
2021-08-15 11:44:13 +02:00
Anshil Gandhi
435785214f [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating
a compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-14 23:37:23 -06:00
Anshil Gandhi
29e11a1aa3 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit c4e5425aa579d21530ef1766d7144b38a347f247.
2021-08-13 23:58:04 -06:00
Anshil Gandhi
c4e5425aa5 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpandPass to report atomics generating a compare
and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-13 22:44:08 -06:00
Kai Luo
b9c3941cd6 [PowerPC] Generate inlined quadword lock free atomic operations via AtomicExpand
This patch uses AtomicExpandPass to implement quadword lock free atomic operations. It adopts the method introduced in https://reviews.llvm.org/D47882, which expand atomic operations post RA to avoid spilling that might prevent LL/SC progress.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D103614
2021-07-15 01:12:09 +00:00
Krzysztof Parzyszek
df88c26f0d [OpaquePtr] Add type parameter to emitLoadLinked
Differential Revision: https://reviews.llvm.org/D105353
2021-07-02 13:07:40 -05:00
Eli Friedman
44cdf771fe [AtomicExpand] Merge cmpxchg success and failure ordering when appropriate.
If we're not emitting separate fences for the success/failure cases, we
need to pass the merged ordering to the target so it can emit the
correct instructions.

For the PowerPC testcase, we end up with extra fences, but that seems
like an improvement over missing fences.  If someone wants to improve
that, the PowerPC backed could be taught to emit the fences after isel,
instead of depending on fences emitted by AtomicExpand.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33332 .

Differential Revision: https://reviews.llvm.org/D103342
2021-06-03 11:34:35 -07:00
Arthur Eubanks
372237487e [OpaquePtr] Remove some uses of PointerType::getElementType() 2021-05-31 16:11:25 -07:00
LemonBoy
b577ec4956 [AtomicExpandPass][AArch64] Promote xchg with floating-point types to integer ones
Follow the same strategy used for atomic loads/stores by converting the operands to equally-sized integer types.
This change prevents the atomic expansion pass from generating illegal LL/SC pairs when targeting AArch64: `expand-atomicrmw-xchg-fp.ll` would previously instantiate intrinsics such as `llvm.aarch64.ldaxr.p0f32` that cannot be lowered.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D103232
2021-05-29 08:57:27 +02:00
Stanislav Mekhanoshin
30b3aab329 Copy syncscope when expanding atomicrmw into cmpxchg loop
Fixes: SWDEV-280070

Differential Revision: https://reviews.llvm.org/D99902
2021-04-05 17:29:38 -07:00
James Y Knight
740e69b6fd Fix assert to use getTypeStoreSize instead of getPrimitiveSizeInBits,
per comment on D97223.
2021-02-26 11:08:00 -05:00
James Y Knight
24539f1ef2 Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg.
And then push those change throughout LLVM.

Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).

The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.

Differential Revision: https://reviews.llvm.org/D97223
2021-02-25 18:29:42 -05:00
Alex Richardson
5bc438efcf [AtomicExpand] Avoid creating an unnamed libcall
I recently modified this pass to better support CHERI-RISC-V and while
doing so I noticed that this pass was calling M->getOrInsertFunction()
with the result of TLI->getLibcallName(RTLibType). However, AMDGPU fills
the libcalls array with nullptr, so this creates an anonymous function
instead. This patch changes expandAtomicOpToLibcall to return false in
case the libcall does not exist and changes the assert() in the callees to
a report_fatal_error() instead.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D88800
2020-11-02 17:52:37 +00:00
Brendon Cahoon
7b114446c3 Align store conditional address
In cases where the alignment of the datatype is smaller than
expected by the instruction, the address is aligned. The aligned
address is used for the load, but wasn't used for the store
conditional, which resulted in a run-time alignment exception.
2020-07-30 10:42:00 -05:00
serge-sans-paille
a60c31fd62 Fix return status of AtomicExpandPass
Correctly reflect change in the return status.

Differential Revision: https://reviews.llvm.org/D83457
2020-07-09 10:27:48 +02:00