227 Commits

Author SHA1 Message Date
agozillon
f1178815d2
[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for Flang maps (#129586)
This PR adds an initial implementation for the map modifiers close,
present and ompx_hold, primarily just required adding the appropriate
map type flags to the map type bits. In the case of ompx_hold it
required adding the map type to the OpenMP dialect. Close has a bit of a
problem when utilised with the ALWAYS map type on descriptors, so it is
likely we'll have to make sure close and always are not applied to the
descriptor simultaneously in the future when we apply always to the
descriptors to facilitate movement of descriptor information to device
for consistency, however, we may find an alternative to this with
further investigation. For the moment, it is a TODO/Note to keep track
of it.
2025-03-07 22:22:30 +01:00
Michael Kruse
e296fb8ff6
Revert "[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-*" (#130274)
Reverts llvm/llvm-project#129692

The builder amdgpu-offload-rhel-8-cmake-build-only fails because its
version of Ninja is too old. At least Ninja 1.10 is required for its
support for dependencies between Fortran modules.

https://lab.llvm.org/buildbot/#/builders/204/builds/2696
2025-03-07 12:30:18 +01:00
Michael Kruse
68578b38cf
[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-* (#129692)
Enable the LLVM_ENABLE_RUNTIMES=flang-rt build of the Fortran runtime
for the amdgpu-offload-* buildbots. This pre-population cmake cache
files is referred to by the llvm-zorg annotated builder factory
[script](872f477610/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py (L26)).

The corresponding change in llvm-zorg is
https://github.com/llvm/llvm-zorg/pull/402
2025-03-07 11:56:00 +01:00
Nikita Popov
4f469ae046 [offload] Fix build after Module::getTargetTriple() change
Adjust for #129868.
2025-03-06 11:04:00 +01:00
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Alex
b8a66f50b4
[OFFLOAD] Update ffi_cif structure to match libffi (#128756)
The ffi_cif structure defined in the wrapper header is smaller than the
actual structure in libffi which results in other structures being
overwritten when libffi is called, and finally in a segfault.

The patch updates the structure to the correct layout as specified in
ffi.h
2025-03-04 11:40:12 -06:00
Jan Patrick Lehr
fe18796142
[Offload][AMDGPU] Enable SPIRV target in build conf (#129323)
Enable the SPIRV backend on the CMake-cache file buildbots.
2025-03-01 21:56:28 +01:00
Jan Patrick Lehr
1824bb47c2
[Offload][OpenMP] Fix check-prefix (#128599) 2025-02-25 00:32:27 +01:00
Zequan Wu
1b15a89a23 Revert "[Offload] Fix assumptions on symbols after #124846 (#126238)"
The dependency commit was reverted at 23aca2f88d. Reverting this as well.
2025-02-24 13:30:54 -08:00
Jan Patrick Lehr
17ccaf4fa8
[NFC][Offload] Fix typo to output architecture (#128527) 2025-02-24 16:54:21 +01:00
Fabian Ritter
a2f9ae1421
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (#125826)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

For SWDEV-512631 and SWDEV-512633
2025-02-19 09:56:04 +01:00
Akash Banerjee
785a5b4676
[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746)
This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare
Mapper directive.

Since both MLIR and Clang now support custom mappers, I've changed the
respective function params to no longer be optional as well.

Depends on #121005
2025-02-18 17:55:48 +00:00
Krzysztof Parzyszek
7b89c41e41
[offload] Remove redundant checks in MappingInfoTy::lookupMapping (#127638)
Also add some clarifying comments.
2025-02-18 11:01:36 -06:00
Joseph Huber
1435c8ed95 Reapply "[LinkerWrapper] Clean up options after proper forwarding" (#126495)
Summary:
The test failed because it no longer passed Rpass by default without
LTO. I think that's desirable as it matches the standard behavior.
This reverts commit 6fd99de31864a5ef84ae8613b3a9034e05293461.
2025-02-14 09:56:46 -06:00
Ethan Luis McDonough
52ee06d273
[PGO][Offload] Fix pgo1.c (#126864)
pgo1.c had outdated test checks
2025-02-12 00:54:31 -06:00
Ethan Luis McDonough
9e5c136d5a
[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365)
This pull request is the second part of an ongoing effort to extends PGO
instrumentation to GPU device code and depends on #76587. This PR makes
the following changes:

- Introduces `__llvm_write_custom_profile` to PGO compiler-rt library.
This is an external function that can be used to write profiles with
custom data to target-specific files.
- Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so
that it can write the collected data to a profraw file.
- Adds `PGODump` debug flag and only displays dump when the
aforementioned flag is set
2025-02-11 23:30:54 -06:00
Joseph Huber
baf7a3c1e5
[Offload] Properly guard modifications to the RPC device array (#126790)
Summary:
If the user deallocates an RPC device this can sometimes fail if the RPC
server is still running. This will happen if the modification happens
while the server is still checking it. This patch adds a mutex to guard
modifications to it.
2025-02-11 14:57:31 -06:00
Joseph Huber
a854c266b9
[Offload][NFC] Rename src/ -> libomptarget/ (#126573)
Summary:
The name `src` is confusing when combined with the plugins and the newly
added `liboffload`.
2025-02-10 13:22:10 -06:00
Joseph Huber
feb30f25c0
[Offload] Fix the offload cache file triggering libc++ / libstdc++ mixing (#126313)
Summary:
We originally wanted `-stdlib=libc++` by default so that it could use
offloading support in libc++, however this causes issues with out the
Offloading proejct itself is built. Is the user builds the LLVM libs
with libstdc++ then uses this cache it will enable this option by
default for the ensuing build of the offloading libraries with the newly
build clang. This will cause a lot of linker failured because the C++
library doesn't match.

Long term I think the proper solution to this is to make better use of
clang configuration files, but I don't know a good way to do that by
default. For now just make it build right.
2025-02-10 13:20:35 -06:00
Joseph Huber
ed9107f2d7
[OpenMP] Replace use of target address space with <gpuintrin.h> local (#126119)
Summary:
This definition is more portable since it defines the correct value for
the target. I got rid of the helper mostly because I think it's easy
enough to use now that it's a type and being explicit about what's
`undef` or `poison` is good.
2025-02-09 10:25:25 -06:00
Jan Patrick Lehr
191d7d64e5
[Offload] Fix assumptions on symbols after #124846 (#126238)
In #124846 the symbolizer was changed to ignore 0-column entries, which
lead to a slightly different representation in the stack traces. This
patch addresses these differences.

Not sure if the difference in kernel_trap.c is also a result of this
change or not.
Can be tracked separate from this, after the bots are back to green.
2025-02-07 13:25:11 +01:00
David Blaikie
14d6e1ebf5 Update test for symbolizer fix 2025-02-06 19:18:20 +00:00
Joseph Huber
5812d0bf8e
[Offload] Make only a single thread handle the RPC server thread (#126067)
Summary:
This patch just changes the interface to make starting the thread
multiple times permissable since it will only be done the first time.
Note that this does not refcount it or anything, so it's onto the user
to make sure that they don't shut down the thread before everyone is
done using it. That is the case today because the shutDown portion is
run by a single thread in the destructor phase.

Another question is if we should make this thread truly global state,
because currently it will be private to each plugin instance, so if you
have an AMD and NVIDIA image there will be two, similarly if you have
those inside of a shared library.
2025-02-06 11:38:14 -06:00
Joseph Huber
f1e917d07b
[Offload] Unify offloading entries into a single section (#125731)
Summary:
This patch unifies the existing offloading entires into a single section
called `llvm_offload_entires`. This lets us use a more unified
offloading infrastructure so that all targets share the same handling.
The effect is that people in the runtimes now need to check if the kind
is what they expect, but the expectation is that you can combine
multiple potential providers into a compile job. Doesn't fully work
yet because of other runtime issues, but some day. Mostly this helps the
future of liboffload where we want to handle different languages than
OpenMP.
2025-02-06 08:24:01 -06:00
Joseph Huber
7a8779422d
[Offload] Stop the RPC server faiilng with more than one GPU (#125982)
Summary:
Pretty dumb mistake of me, forgot that this is run per-device and
per-plugin, which fell through the cracks with my testing because I have
two GPUs that use different plugins.
2025-02-05 20:51:28 -06:00
Joseph Huber
bb7ab2557c
[OpenMP] Port the OpenMP device runtime to direct C++ compilation (#123673)
Summary:
This removes the use of OpenMP offloading to build the device runtime.
The main benefit here is that we no longer need to rely on offloading
semantics to build a device only runtime. Things like variants are now
no longer needed and can just be simple if-defs. In the future, I will
remove most of the special handling here and fold it into calls to the
`<gpuintrin.h>` functions instead. Additionally I will rework the
compilation to make this a separate runtime.

The current plan is to have this, but make including OpenMP and
offloading either automatically add it, or print a warning if it's
missing. This will allow us to use a normal CMake workflow and delete
all the weird 'lets pull the clang binary out of the build' business.
```
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=offload
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa
```

After that, linking the OpenMP device runtime will be `-Xoffload-linker
-lomp`. I.e. no more fat binary business.

Only look at the most recent commit since this includes the two
dependencies
(fix to AMDGPUEmitPrintfBinding and the PointerToMember bug).
2025-02-05 08:18:52 -06:00
Joseph Huber
a284a6ed17 [OpenMP] Guard OpenMP specific entry handling 2025-02-03 16:16:18 -06:00
Michał Górny
689ef5fda0
[offload] [test] Use test compiler ID rather than host (#124408)
Use the test compiler ID to verify whether tests can be run rather than
the host compiler. This makes it possible to run tests (with Clang)
while the library itself was built with GCC.
2025-02-02 15:55:39 +00:00
Michał Górny
359a913170
[offload] gnu::format with variadic template functions is Clang-only (#124406)
Use `gnu::format` attribute only when compiling with Clang, as using it
against variadic template functions is a Clang extension and is not
supported by GCC.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77958

Fixes #119069
2025-02-02 15:55:22 +00:00
Christian Clauss
1f56bb3137
[Offload][NFC] Fix typos discovered by codespell (#125119)
https://github.com/codespell-project/codespell

% `codespell
--ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes
--write-changes`
2025-01-31 09:35:29 -06:00
agozillon
2428b6ec40
[Flang][MLIR][OpenMP] Fix Target Data if (present(...)) causing LLVM-IR branching error (#123771)
Currently if we generate code for the below target data map that uses an
optional mapping:

       !$omp target data if(present(a)) map(alloc:a)
            do i = 1, 10
                a(i) = i
            end do
       !$omp end target data

We yield an LLVM-IR error as the branch for the else path is not
generated. This occurs because we enter the NoDupPriv path of the call
back function when generating the else branch, however, the emitBranch
function needs to be set to a block for it to functionally generate and
link in a follow up branch. The NoDupPriv path currently doesn't do
this, while it's not supposed to generate anything (as far as I am
aware) we still need to at least set the builders placement back so that
it emits the appropriate follow up branch. This avoids the missing
terminator LLVM-IR verification error by correctly generating the follow
up branch.
2025-01-30 17:33:36 +01:00
agozillon
e0054e984c
[MLIR][OpenMP] Emit nullary check for mapped pointer members and appropriate size select based on results (#124604)
This PR aims to fix a mapping error when trying to map nullary elements
of a record type (primary example is allocatables/pointer types in
Fortran at the moment). This should be legal to map, just not write to
without pointing to anything within the target region. A common Fortran
OpenMP idiom/example where this is useful can be found in the added
Fortran offload example.

The runtime error arises when we try to map the pointer member utilising
a prescribed constant size that we receive from the lowered type,
resulting in mapping of data that will be non-existent when there is no
allocated data. The fix in this case is to emit a runtime check to see
if the data has been allocated, if it hasn't been we select a size of 0,
if it has we emit the usual type size.
2025-01-29 17:51:33 +01:00
Jan Patrick Lehr
d412fe531d
[Offload] Enable mlir and flang in bot build (#124915)
This enables more projects in the CMake cache to add them to the
buildbot coverage in the AMDGPU buildbots.
2025-01-29 14:13:59 +01:00
Joseph Huber
13dcc95dcd
[Offload] Rework offloading entry type to be more generic (#124018)
Summary:
The previous offloading entry type did not fit the current use-cases
very well. This widens it and adds a version to prevent further
annoyances. It also includes the kind to better sort who's using it.

The first 64-bytes are reserved as zero so the OpenMP runtime can detect
the old format for binary compatibilitry.
2025-01-28 07:26:13 -06:00
Joseph Huber
760a786d15
[Clang] Prevent mlink-builtin-bitcode from internalizing the RPC client (#118661)
Summary:
Currently, we only use `-mlink-builtin-bitcode` for non-LTO NVIDIA
compiliations. This has the problem that it will internalize the RPC
client symbol which needs to be visible to the host. To counteract that,
I put `retain` on it, but this also prevents optimizations on the global
itself, so the passes we have that remove the symbol don't work on
OpenMP anymore. This patch does the dumbest solution, adding a special
string check for it in clang. Not the best solution, the runner up would
be to have a clang attribute for `externally_initialized` because those
can't be internalized, but that might have some unfortunate
side-effects. Alternatively we could make NVIDIA compilations do LTO all
the time, but that would affect some users and it's harder than I
thought.
2025-01-27 19:30:59 -06:00
Joseph Huber
38b3f45a81 [Offload] Fix offload-info interface
Summary:
The offload info tool doesn't initialize things properly, just check
this first instead.
2025-01-27 10:36:09 -06:00
Joseph Huber
f07505849c [Offload] Fix server thread from being shut down if unused 2025-01-27 08:29:41 -06:00
Joseph Huber
e7592d83e0 [Offload][NFC] Make sure the thread is not running already 2025-01-27 08:06:29 -06:00
Joseph Huber
bd8a818128 [Offload] Add cuLaunchHostFunc to dynamic cuda
Summary:
This was missing, causing non-directly linked builds to fail.
2025-01-24 11:41:20 -06:00
Joseph Huber
134401deea
[Offload] Move RPC server handling to a dedicated thread (#112988)
Summary:
Handling the RPC server requires running through list of jobs that the
device has requested to be done. Currently this is handled by the thread
that does the waiting for the kernel to finish. However, this is not
sound on NVIDIA architectures and only works for async launches in the
OpenMP model that uses helper threads.

However, we also don't want to have this thread doing work
unnnecessarily. For this reason we track the execution of kernels and
cause the thread to sleep via a condition variable (usually backed by
some kind of futex or other intelligent sleeping mechanism) so that the
thread will be idle while no kernels are running.
2025-01-24 11:36:45 -06:00
hidekisaito
ed512710a5
[Offload] Make MemoryManager threshold ENV var size_t type. (#124063) 2025-01-23 11:46:56 -06:00
Joseph Huber
6518b121f0
[Offload][NFC] Factor out and rename the __tgt_offload_entry struct (#123785)
Summary:
This patch is an NFC renaming to make using the offloading entry type
more portable between other targets. Right now this is just moving its
definition to LLVM so others can use it. Future work will rework the
struct layout.
2025-01-21 12:05:24 -06:00
Joseph Huber
f233a54ae8
[OpenMP] Remove usage of pointer-to-member in lookup (#123671)
Summary:
This is buggy and is currently being tracked in
https://github.com/llvm/llvm-project/issues/123241. For now, replace it
with a macro so that we can use address spaces directly.
2025-01-21 07:50:40 -06:00
Joseph Huber
3274bf6b42
[OpenMP] Make each atomic helper take an atomic scope argument (#122786)
Summary:
Right now we just default to device for each type, and mix an ad-hoc
scope with the one used by the compiler's builtins. Unify this can make
each version take the scope optionally.

For @ronlieb, this will remove the need for `add_system` in the fork as
well as the extra `cas` with system scope, just pass `system`.
2025-01-20 21:58:27 -06:00
Joseph Huber
2d9f406943
[OpenMP] Adjust 'printf' handling in the OpenMP runtime (#123670)
Summary:
We used to avoid a lot of this stuff because we didn't properly handle
variadics in device code. That's been solved for now, so we can just
make an internal printf handler that forwards to the external `vprintf`
function. This is either provided by NVIDIA's SDK or by the GPU libc
implementation.

The main reason for doing this is because it prevents the stupid AMDGPU
printf pass from mangling our beautiful printfs!
2025-01-20 21:56:46 -06:00
Joseph Huber
723a3e746a [OpenMP] Fix mispelled attribute and warning
Summary:
This is spelled `ompx_aligned_barrier` when used directly, but wasn't
included in the list of known assumptions. Fix that so now th test
works.
2025-01-20 08:40:19 -06:00
Joseph Huber
58af82b462
[OpenMP] Remove 'omp assumes' scopes now that we have no inline ASM (#123611)
Summary:
We used this globally scoped `ext_no_call_asm` as a sort of hack around
the compiler that allowed the attributor to optimize out inline assembly
calls to PTX instructions. Quite some time ago I got rid of every inline
assembly call and replaced it with a builitin, so this can just be
deleted.

Furthermore, I use the `[[omp::assume]]` attribute directly for the
aligned barrier usage. This prints an unknown assumption warning (even
though it isn't) so I'm just silencing that for now until I fix it
later.

---------

Co-authored-by: Michael Kruse <github@meinersbur.de>
2025-01-20 08:11:06 -06:00
Jan Patrick Lehr
4b3c17850b
[Offload] Enable shared-libs; compiler-rt as default RTLIB (#123568)
This is the next step to move the CMake cache file builder closer to the
build configuration we care about downstream.
2025-01-20 10:23:41 +01:00
Jan Patrick Lehr
61f94ebc9e
[NFC][Offload] Structure/Readability of CMake cache (#123328)
Preparing to add more config options and want to group them all from
most-common to project / component specific.
2025-01-17 13:01:25 +01:00
Joseph Huber
1c00d0d776
[OpenMP] Remove hack around missing atomic load (#122781)
Summary:
We used to do a fetch add of zero to approximate a load. This is because
the NVPTX backend didn't handle this properly. It's not an issue anymore
so simply use the proper atomic builtin.
2025-01-16 15:17:15 -06:00