252 Commits

Author SHA1 Message Date
agozillon
b2c9a58b8f
[Flang][OpenMP][MLIR] Check for presence of Box type before emitting store in MapInfoFinalization pass (#135477)
Currently we don't check for the presence of descriptor/BoxTypes before
emitting stores which lower to memcpys, the issue with this is that
users can have optional arguments, where they don't provide an input,
making the argument effectively null. This can still be mapped and this
causes issues at the moment as we'll emit a memcpy for function
arguments to store to a local variable for certain edge cases, when we
perform this memcpy on a null input, we cause a segfault at runtime.

The fix to this is to simply create a branch around the store that
checks if the data we're copying from is actually present. If it is, we
proceed with the store, if it isn't we skip it.
2025-04-14 17:15:56 +02:00
Joseph Huber
2f41fa387d
[AMDGPU] Fix code object version not being set to 'none' (#135036)
Summary:
Previously, we removed the special handling for the code object version
global. I erroneously thought that this meant we cold get rid of this
weird `-Xclang` option. However, this also emits an LLVM IR module flag,
which will then cause linking issues.
2025-04-10 11:31:21 -05:00
Zequan Wu
78b21ddba7 Revert "Reland "Symbolize line zero as if no source info is available (#124846)" (#133798)"
This reverts commit 348374028970c956f2e49ab7553b495d7408ccd9 because #128619 doesn't handle the case when we have an empty frame from `getInliningInfoForAddress` because line num is 0 which makes it non-differentiable from missing debug info. So, we end up using the base filename from symtab again. Reverting for now until that issus is solved.
2025-04-09 18:09:31 -07:00
Joel E. Denny
5709506de0
[offload] Fix finding amdgpu/nvptx-arch to generate tests (#135072)
PR #134713, which landed as 79cb6f05da37, causes this on my test
systems:

```
-- Building AMDGPU plugin for dlopened libhsa
-- Not generating AMDGPU tests, no supported devices detected. Use 'LIBOMPTARGET_FORCE_AMDGPU_TESTS' to override.
-- Building CUDA plugin for dlopened libcuda
-- Not generating NVIDIA tests, no supported devices detected. Use 'LIBOMPTARGET_FORCE_NVIDIA_TESTS' to override.
```

The problem is it cannot locate amdgpu-arch and nvptx-arch. This patch
enables it to.

I suspect there is more cleanup to do here. amdgpu-arch and nvptx-arch
do not appear to exist as cmake targets anymore, but there is still
cmake code here that looks for those targets.
2025-04-09 15:54:29 -04:00
Joel E. Denny
ad9f6d3cee
[PGO][Offload] Use %profdata in PGO tests (#135015)
So that the wrong llvm-profdata is not picked up from PATH.
2025-04-09 10:40:46 -04:00
Jan Leyonberg
fbc8335311
[MLIR][OpenMP] Add codegen for teams reductions (#133310)
This patch adds the lowering of teams reductions from the omp dialect to
LLVM-IR. Some minor cleanup was done in clang to remove an unused
parameter.
2025-04-07 12:47:16 -04:00
Sergio Afonso
66fca0674d
[OpenMP] Fix num_iters in __kmpc_*_loop DeviceRTL functions (#133435)
This patch removes the addition of 1 to the number of iterations when
calling the following DeviceRTL functions:
- `__kmpc_distribute_for_static_loop*`
- `__kmpc_distribute_static_loop*`
- `__kmpc_for_static_loop*`

Calls to these functions are currently only produced by the OMPIRBuilder
from flang, which already passes the correct number of iterations to
these functions. By adding 1 to the received `num_iters` variable,
worksharing can produce incorrect results. This impacts flang OpenMP
offloading of `do`, `distribute` and `distribute parallel do`
constructs.

Expecting the application to pass `tripcount - 1` as the argument seems
unexpected as well, so rather than updating flang I think it makes more
sense to update the runtime.
2025-04-01 10:29:08 +01:00
Zequan Wu
3483740289
Reland "Symbolize line zero as if no source info is available (#124846)" (#133798)
This land commits 23aca2f88dd5d2447e69496c89c3ed42a56f9c31 and
1b15a89a23c631a8e2d096dad4afe456970572c0.
https://github.com/llvm/llvm-project/pull/128619 makes symbolizer to
always use debug info when available so we can reland this chagnge.
2025-03-31 19:13:46 -04:00
Alex
021a1f6974
[OFFLOAD] Stricter enforcement of user offload disable (#133470)
If user specifies offload is disabled (e.g.,
OMP_TARGET_OFFLOAD=disable), disable library almost completely. This
reduces resources spent to a minimum and ensures all APIs behave as if
the only available device is the host device.

Currently some of the APIs behave as if there were devices avaible for
offload even when under OMP_TARGET_OFFLOAD=disable.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-03-28 17:28:14 -05:00
Ethan Luis McDonough
0c81105373
[PGO][Offload] Disable PGO on NVPTX (#133522) 2025-03-28 16:32:32 -05:00
Joseph Huber
772173f548
[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870)
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
2025-03-28 07:35:16 -05:00
macurtis-amd
21a8c63cdc
[offload] Remove bad assert in StaticLoopChunker::Distribute (#132705)
When building with asserts enabled, this can actually cause strange
miscompilations because an incorrect llvm.assume is generated at the
point of the assertion.
2025-03-28 04:53:00 -05:00
Joseph Huber
75f810e025
[Offload] Guard HSA implicit arguments if they aren't created (#133073)
Summary:
We conditionally allocate the implicit arguments, so they possibly are
null. The flang compiler seems to hit this case, even though it
shouldn't when it's supposed to conform to the HSA code object. For now
guard this to fix the regression and cover a case in the future where
someone rolls a fully custom implementatation.

Fixes: https://github.com/llvm/llvm-project/issues/132982
2025-03-26 08:54:33 -05:00
Joseph Huber
25bf4e262c
[Offload] Remove handling for COV4 binaries from offload/ (#131033)
Summary:
We moved from cov4 to cov5 a long time ago, and it guards simplifying
some front end code, so we should be able to move up with this.
2025-03-24 18:58:20 -05:00
Ethan Luis McDonough
c50d39f073
[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268)
This pull request is the third part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on
https://github.com/llvm/llvm-project/pull/93365. This PR makes the
following changes:

- Allows PGO flags to be supplied to GPU targets
- Pulls version global from device
- Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to
allow the PGO version to be overridden
2025-03-19 19:01:38 -05:00
Joseph Huber
cb493d2bab
[OpenMP] Replace utilities with 'gpuintrin.h' definitions (#131644)
Summary:
Port more instructions. AMD version is at
https://gist.github.com/jhuber6/235d7ee95f747c75f9a3cfd8eedac6aa
2025-03-19 10:47:21 -05:00
Jon Chesterfield
deb0f3c09b
[openmp][nfc] Use builtin align in the devicertl (#131918)
Noticed while extracting the smartstack as a test case
2025-03-18 21:31:49 +00:00
Jon Chesterfield
395bdebebd Revert "[openmp][nfc] Refactor shared/lds smartstack for spirv (#131905)"
This reverts commit c02b935a9be888bbdf9f8cb0bf980bd411ae5893.
Failed a check-offload test under CI
2025-03-18 20:43:05 +00:00
Joseph Huber
206f78dfec
[OpenMP] Use 'gpuintrin.h' definitions for simple block identifiers (#131631)
Summary:
This patch ports the runtime to use `gpuintrin.h` instead of calling the
builtins for most things. The `lanemask_gt` stuff was left for now with
a fallback.

AMD version for Ron
https://gist.github.com/jhuber6/42014d635b9a8158727640876bf47226.
2025-03-18 15:38:46 -05:00
Jon Chesterfield
c02b935a9b
[openmp][nfc] Refactor shared/lds smartstack for spirv (#131905)
Spirv doesn't have implicit conversions between address spaces (at least
at present, we might need to change that) and address space qualified
*this pointers are not handled well by clang. This commit changes the
single instance of the smartstack to be explicitly a singleton, for
fractionally simpler IR generation (no this pointer) and to sidestep the
work in progress spirv64-- openmp target not being able to compile the
original version.
2025-03-18 20:33:24 +00:00
Joseph Huber
8437b7f558
[libc] Make RPC server handling header only (#131205)
Summary:
This patch moves the RPC server handling to be a header only utility
stored in the `shared/` directory. This is intended to be shared within
LLVM for the loaders and `offload/` handling.

Generally, this makes it easier to share code without weird
cross-project binaries being plucked out of the build system. It also
allows us to soon move the loader interface out of the `libc` project so
that we don't need to bootstrap those and can build them in LLVM.
2025-03-13 19:23:21 -05:00
Michael Kruse
d3255474be Reapply "[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-*" (#130274)
Enable the LLVM_ENABLE_RUNTIMES=flang-rt build of the Fortran runtime
for the amdgpu-offload-* buildbots. This pre-population cmake cache
files is referred to by the llvm-zorg annotated builder factory
[script](872f477610/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py (L26)).

The corresponding change in llvm-zorg is
llvm/llvm-zorg#402

This reverts commit e296fb8ff6255b97db9ff6cd941acc730164b38f.

The worker of amdgpu-offload-rhel-8-cmake-build-only has been updated
with a newer version of Ninja that supports Fortran.
2025-03-13 13:21:36 +01:00
Krzysztof Parzyszek
f4fc2d731c
[flang][OpenMP] Map ByRef if size/alignment exceed that of a pointer (#130832)
Improve the check for whether a type can be passed by copy. Currently,
passing by copy is done via the OMP_MAP_LITERAL mapping, which can only
transfer as much data as can be contained in a pointer representation.
2025-03-12 19:41:11 -05:00
Nikita Popov
f137c3d592
[TargetRegistry] Accept Triple in createTargetMachine() (NFC) (#130940)
This avoids doing a Triple -> std::string -> Triple round trip in lots
of places, now that the Module stores a Triple.
2025-03-12 17:35:09 +01:00
Krzysztof Parzyszek
d67947162f
[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568)
The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.

When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.

Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).

---------

Co-authored-by: Sergio Afonso <safonsof@amd.com>
2025-03-10 08:11:01 -05:00
agozillon
f1178815d2
[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for Flang maps (#129586)
This PR adds an initial implementation for the map modifiers close,
present and ompx_hold, primarily just required adding the appropriate
map type flags to the map type bits. In the case of ompx_hold it
required adding the map type to the OpenMP dialect. Close has a bit of a
problem when utilised with the ALWAYS map type on descriptors, so it is
likely we'll have to make sure close and always are not applied to the
descriptor simultaneously in the future when we apply always to the
descriptors to facilitate movement of descriptor information to device
for consistency, however, we may find an alternative to this with
further investigation. For the moment, it is a TODO/Note to keep track
of it.
2025-03-07 22:22:30 +01:00
Michael Kruse
e296fb8ff6
Revert "[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-*" (#130274)
Reverts llvm/llvm-project#129692

The builder amdgpu-offload-rhel-8-cmake-build-only fails because its
version of Ninja is too old. At least Ninja 1.10 is required for its
support for dependencies between Fortran modules.

https://lab.llvm.org/buildbot/#/builders/204/builds/2696
2025-03-07 12:30:18 +01:00
Michael Kruse
68578b38cf
[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-* (#129692)
Enable the LLVM_ENABLE_RUNTIMES=flang-rt build of the Fortran runtime
for the amdgpu-offload-* buildbots. This pre-population cmake cache
files is referred to by the llvm-zorg annotated builder factory
[script](872f477610/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py (L26)).

The corresponding change in llvm-zorg is
https://github.com/llvm/llvm-zorg/pull/402
2025-03-07 11:56:00 +01:00
Nikita Popov
4f469ae046 [offload] Fix build after Module::getTargetTriple() change
Adjust for #129868.
2025-03-06 11:04:00 +01:00
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Alex
b8a66f50b4
[OFFLOAD] Update ffi_cif structure to match libffi (#128756)
The ffi_cif structure defined in the wrapper header is smaller than the
actual structure in libffi which results in other structures being
overwritten when libffi is called, and finally in a segfault.

The patch updates the structure to the correct layout as specified in
ffi.h
2025-03-04 11:40:12 -06:00
Jan Patrick Lehr
fe18796142
[Offload][AMDGPU] Enable SPIRV target in build conf (#129323)
Enable the SPIRV backend on the CMake-cache file buildbots.
2025-03-01 21:56:28 +01:00
Jan Patrick Lehr
1824bb47c2
[Offload][OpenMP] Fix check-prefix (#128599) 2025-02-25 00:32:27 +01:00
Zequan Wu
1b15a89a23 Revert "[Offload] Fix assumptions on symbols after #124846 (#126238)"
The dependency commit was reverted at 23aca2f88d. Reverting this as well.
2025-02-24 13:30:54 -08:00
Jan Patrick Lehr
17ccaf4fa8
[NFC][Offload] Fix typo to output architecture (#128527) 2025-02-24 16:54:21 +01:00
Fabian Ritter
a2f9ae1421
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (#125826)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

For SWDEV-512631 and SWDEV-512633
2025-02-19 09:56:04 +01:00
Akash Banerjee
785a5b4676
[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746)
This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare
Mapper directive.

Since both MLIR and Clang now support custom mappers, I've changed the
respective function params to no longer be optional as well.

Depends on #121005
2025-02-18 17:55:48 +00:00
Krzysztof Parzyszek
7b89c41e41
[offload] Remove redundant checks in MappingInfoTy::lookupMapping (#127638)
Also add some clarifying comments.
2025-02-18 11:01:36 -06:00
Joseph Huber
1435c8ed95 Reapply "[LinkerWrapper] Clean up options after proper forwarding" (#126495)
Summary:
The test failed because it no longer passed Rpass by default without
LTO. I think that's desirable as it matches the standard behavior.
This reverts commit 6fd99de31864a5ef84ae8613b3a9034e05293461.
2025-02-14 09:56:46 -06:00
Ethan Luis McDonough
52ee06d273
[PGO][Offload] Fix pgo1.c (#126864)
pgo1.c had outdated test checks
2025-02-12 00:54:31 -06:00
Ethan Luis McDonough
9e5c136d5a
[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365)
This pull request is the second part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on #76587. This PR makes
the following changes:

- Introduces `__llvm_write_custom_profile` to PGO compiler-rt library.
This is an external function that can be used to write profiles with
custom data to target-specific files.
- Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so
that it can write the collected data to a profraw file.
- Adds `PGODump` debug flag and only displays dump when the
aforementioned flag is set
2025-02-11 23:30:54 -06:00
Joseph Huber
baf7a3c1e5
[Offload] Properly guard modifications to the RPC device array (#126790)
Summary:
If the user deallocates an RPC device this can sometimes fail if the RPC
server is still running. This will happen if the modification happens
while the server is still checking it. This patch adds a mutex to guard
modifications to it.
2025-02-11 14:57:31 -06:00
Joseph Huber
a854c266b9
[Offload][NFC] Rename src/ -> libomptarget/ (#126573)
Summary:
The name `src` is confusing when combined with the plugins and the newly
added `liboffload`.
2025-02-10 13:22:10 -06:00
Joseph Huber
feb30f25c0
[Offload] Fix the offload cache file triggering libc++ / libstdc++ mixing (#126313)
Summary:
We originally wanted `-stdlib=libc++` by default so that it could use
offloading support in libc++, however this causes issues with out the
Offloading proejct itself is built. Is the user builds the LLVM libs
with libstdc++ then uses this cache it will enable this option by
default for the ensuing build of the offloading libraries with the newly
build clang. This will cause a lot of linker failured because the C++
library doesn't match.

Long term I think the proper solution to this is to make better use of
clang configuration files, but I don't know a good way to do that by
default. For now just make it build right.
2025-02-10 13:20:35 -06:00
Joseph Huber
ed9107f2d7
[OpenMP] Replace use of target address space with <gpuintrin.h> local (#126119)
Summary:
This definition is more portable since it defines the correct value for
the target. I got rid of the helper mostly because I think it's easy
enough to use now that it's a type and being explicit about what's
`undef` or `poison` is good.
2025-02-09 10:25:25 -06:00
Jan Patrick Lehr
191d7d64e5
[Offload] Fix assumptions on symbols after #124846 (#126238)
In #124846 the symbolizer was changed to ignore 0-column entries, which
lead to a slightly different representation in the stack traces. This
patch addresses these differences.

Not sure if the difference in kernel_trap.c is also a result of this
change or not.
Can be tracked separate from this, after the bots are back to green.
2025-02-07 13:25:11 +01:00
David Blaikie
14d6e1ebf5 Update test for symbolizer fix 2025-02-06 19:18:20 +00:00
Joseph Huber
5812d0bf8e
[Offload] Make only a single thread handle the RPC server thread (#126067)
Summary:
This patch just changes the interface to make starting the thread
multiple times permissable since it will only be done the first time.
Note that this does not refcount it or anything, so it's onto the user
to make sure that they don't shut down the thread before everyone is
done using it. That is the case today because the shutDown portion is
run by a single thread in the destructor phase.

Another question is if we should make this thread truly global state,
because currently it will be private to each plugin instance, so if you
have an AMD and NVIDIA image there will be two, similarly if you have
those inside of a shared library.
2025-02-06 11:38:14 -06:00
Joseph Huber
f1e917d07b
[Offload] Unify offloading entries into a single section (#125731)
Summary:
This patch unifies the existing offloading entires into a single section
called `llvm_offload_entires`. This lets us use a more unified
offloading infrastructure so that all targets share the same handling.
The effect is that people in the runtimes now need to check if the kind
is what they expect, but the expectation is that you can combine
multiple potential providers into a compile job. Doesn't fully work
yet because of other runtime issues, but some day. Mostly this helps the
future of liboffload where we want to handle different languages than
OpenMP.
2025-02-06 08:24:01 -06:00
Joseph Huber
7a8779422d
[Offload] Stop the RPC server faiilng with more than one GPU (#125982)
Summary:
Pretty dumb mistake of me, forgot that this is run per-device and
per-plugin, which fell through the cracks with my testing because I have
two GPUs that use different plugins.
2025-02-05 20:51:28 -06:00