73 Commits

Author SHA1 Message Date
Joseph Huber
3c50cbfda4
[DeviceRTL] Make defined 'libc' functions weak in OpenMP (#97356)
Summary:
These functions provide special-case implementations internal to the
OpenMP device runtime. This can potentially conflict with the symbols
pulled in from the actual GPU `libc`. This patch makes these weak, so in
the case that the GPU libc functions exist they will be overridden. This
should not impact performance in the average case because the old
`-mlink-builtin-bitcode` version does internalization, deleting weak,
and the new LTO path will resolve to the strong reference and then
internalize it.
2024-07-02 13:23:53 -05:00
dhruvachak
946f5d111d
[OpenMP] [OMPT] Callback registration should not depend on the device init callback. (#96371)
Even if the device init callback is not registered, a tool should be
allowed to register other callbacks.
2024-07-01 10:07:05 -07:00
Gheorghe-Teodor Bercea
1a478a69bc
[OpenMP][offload] Fix dynamic schedule tracking (#97065)
This patch fixes the dynamic schedule tracking.
2024-07-01 10:23:11 -04:00
Ethan Luis McDonough
8823448807
[Offload] Refactor offload test requirements (#95196)
Many tests in the `offload` project have requirements defined by which
targets are not supported rather than which platforms are supported.
This patch aims to streamline the requirement definitions by adding four
new feature tags: `host`, `gpu`, `amdgpu`, and `nvidiagpu`.
2024-06-29 00:56:18 -05:00
Ethan Luis McDonough
2c8b912f63
Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587)"
This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.
2024-06-28 12:30:45 -05:00
Ethan Luis McDonough
5fd2af38e4
[PGO][OpenMP] Instrumentation for GPU devices (#76587)
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:

- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)

These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-06-28 10:42:19 -05:00
agozillon
aec735cf47
[Flang][OpenMP][MLIR] Fix common block mapping for regular and declare target link (#91829)
This PR attempts to fix common block mapping for regular mapping of
these types as well as when they have been marked as "declare target
link". This PR should allow correct mapping of both the members of a
common block and the full common block via its block symbol.

The main changes were some adjustments to the Fortran OpenMP lowering to
HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and
adjustments to the way the we handle target kernel map argument
rebinding inside of the OMPIRBuilder.

For the Fortran OpenMP lowering were two changes, one to prevent the
implicit capture of common block members when the common block symbol
itself has been marked and the other creates intermediate member access
inside of the target region to be used in-place of those external to the
target region, this prevents external usages breaking the
IsolatedFromAbove pact.

In the latter case, there was an adjustment to the size calculation for
types to better handle cases where we pass an array as the type of a map
(as opposed to the bounds and the type of the element), which occurs in
the case of common blocks. There is also some adjustment to how
handleDeclareTargetMapVar handles renaming of declare target symbols in
the module to the reference pointer, now it will only apply to those
within the kernel that is currently being generated and we also perform
a modification to replace constants with instructions as necessary as we
cannot replace these with our reference pointer (non-constant and
constants do not mix nicely).

In the case of the OpenMPIRBuilder some changes were made to defer
global symbol rebinding to kernel arguments until all other arguments
have been rebound. This makes sure we do not replace uses that may refer
to the global (e.g. a GEP) but are themselves actually a separate
argument that needs bound.

Currently "declare target to" still needs some work, but this may be the
case for all types in conjunction with "declare target to" at the
moment.
2024-06-25 20:54:04 +02:00
Joseph Huber
3de162fee4
[Offload] Fix using old deprecated CUDA root variable (#96307)
Summary:
This variable isn't being set properly since we moved to the new way to
find the CUDA directory. That means this variable was just unset the
whole time. This patch adds it in by calculating it using the binary
directory so it can be passed to `--cuda-path`.
2024-06-21 10:08:05 -05:00
Wu Yingcong
89841137fb
[offload][cmake] always define pythonize_bool macro (#96028)
I use the following cmake config to build offload and openmp
```
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_PROJECTS="clang;openmp"  -DLLVM_ENABLE_RUNTIMES="offload"  -DLLVM_LIT_ARGS="-vv -a" -DLLVM_ENABLE_ASSERTIONS=ON  ../llvm
```
and got the following error:
```
CMake Error at /tmp/build-llvm/llvm/offload/CMakeLists.txt:321 (pythonize_bool):
  Unknown CMake command "pythonize_bool". 
```

After some search I find out that the "correct" way to build this is
putting openmp and offload to the ENABLE_RUNTIMES like
```
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_PROJECTS="clang"  -DLLVM_ENABLE_RUNTIMES="openmp;offload"  -DLLVM_LIT_ARGS="-vv -a" -DLLVM_ENABLE_ASSERTIONS=ON  ../llvm
```
.


But since we don't forbid to config them using openmp as PROJECT and
offload as RUNTIME, then we probably support it. The fix is to always
define the pythonize_bool macro. For cmake, it is okay to redefine a
macro, it does not cause a warning or else.
2024-06-20 07:00:19 -05:00
Joseph Huber
8043356380
[Offload] Change HSA header search order (#95769)
Summary:
The HSA headers existed previously in `include/hsa.h` and were moved to
`include/hsa/hsa.h` in a later ROCm version. The include headers here
were originally designed to favor a newer one. However, this
unintentionally prevented the dyanmic HSA's `hsa.h` from being used if
both were present. This patch changes the order so it will be found
first.

Related to https://github.com/llvm/llvm-project/pull/95484.
2024-06-17 14:52:50 -05:00
Tim Gymnich
597d2f7662
[OpenMP] Add Environment Variable to disable Reuse of Blocks for High Loop Trip Counts (#89239)
Sometimes it might be beneficial to spawn more thread blocks instead of
reusing existing for multiple loop iterations.

**Alternatives considered:**

Make `DefaultNumBlocks` settable via an environment variable.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-06-14 07:35:23 -07:00
agozillon
0aeaa2d93d
[OMPIRBuilder][OpenMP][LLVM] Modify and use ReplaceConstant utility in convertTarget (#94541)
This PR seeks to expand/replace the Constant -> Instruction conversion
that needs to occur inside of the OpenMP Target kernel generation to
allow kernel argument replacement of uses within the kernel (cannot
replace constant uses within constant expressions with non-constants).
It does so by making use of the new-ish utility
convertUsersOfConstantsToInstructions which is a much more expansive
version of what the smaller "version" of the function I wrote does,
effectively expanding uses of the input argument that are constant
expressions into instructions so that we can replace with the
appropriate kernel argument.

Also alters convertUsersOfConstantsToInstructions to optionally
restrict the replacement to a function and optionally leave
dead constants alone, the latter is necessary when lowering from 
MLIR as we cannot be sure we can remove the constants at this 
stage, even if rewritten to instructions the ModuleTranslation may 
maintain links to the original constants and utilise them in further 
lowering steps (as when we're lowering the kernel, the module is 
still in the process of being lowered). This can result in unusual 
ICEs later. These dead constants can be tidied up later (and 
appear to be in subsequent lowering from checking with 
emit-llvm).
2024-06-13 15:57:15 +02:00
Johannes Doerfert
54b5c76d3b
[Offload] Use flat array for cuLaunchKernel (#95116)
We already used a flat array of kernel launch parameters for the AMD GPU
launch but now we also use this scheme for the NVIDIA GPU launch. The
only remaining/required use of the indirection is the host plugin (due
ot ffi). This allows to us simplify the use for non-OpenMP kernel
launch.
2024-06-13 09:43:47 +03:00
Johannes Doerfert
f2120cda7d
[Offload][AMDGPU] Impose more restrictions for implicit kernel arguments (#95211)
COV3 is not supported anymore, thus we can just use ArgsSize we read
from the kernel to determine how many argument bytes we need and if
implicit kernel arguments are used.
2024-06-12 16:42:20 +03:00
Johannes Doerfert
2eb60e2de8
[Offload][NFCI] Initialize the KernelArgsTy to default values (#95117)
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-06-11 17:05:04 +03:00
estewart08
89c92b0bcf
[OpenMP][Offload] - Ensure OPENMP_STANDALONE_BUILD is defined (#94801)
Without a value set conditional checks like
if(NOT ${OPENMP_STANDALONE_BUILD})
will not be able to evaluate to true.
Fixes issue introduced from PR #93463, which did not allow the OMPT
variable to be propogated up to offload during a runtimes build.
2024-06-07 15:37:42 -05:00
Joseph Huber
9e209a4a37
[Offload] Use the kernel argument size directly in AMDGPU offloading (#94667)
Summary:
The old COV3 implementation of HSA used to omit the implicit arguments
from the kernel argument size. For COV4 and COV5 this is no longer the
case so we can simply use the size reported from the symbol information.

See
https://github.com/ROCm/ROCR-Runtime/issues/117#issuecomment-812758161
2024-06-06 15:19:55 -05:00
Joseph Huber
2cc1644299 [Offload] Fix missing abs function for test
Summary:
We don't have the abs function to link against, just use the builtin.
2024-06-06 14:37:06 -05:00
Shilei Tian
1ca0055f45
[AMDGPU] Add a new target gfx1152 (#94534) 2024-06-06 12:16:11 -04:00
Joseph Huber
435aa7663d
[Libomptarget] Rework device initialization and image registration (#93844)
Summary:
Currently, we register images into a linear table according to the
logical OpenMP device identifier. We then initialize all of these images
as one block. This logic requires that images are compatible with *all*
devices instead of just the one that it can run on. This prevents us
from running on systems with heterogeneous devices (i.e. image 1 runs on
device 0 image 0 runs on device 1).

This patch reworks the logic by instead making the compatibility check a
per-device query. We then scan every device to see if it's compatible
and do it as they come.
2024-06-06 08:10:56 -05:00
Shilei Tian
4cd115ca34
[OpenMP][OMPX] Fix ompx_ballot_sync test (#94140)
The current test is not really correct because the mask is set to
0xffffffff
even if it is on an AMDGPU whose wavefront size is 64. Besides,
`__AMDGCN_WAVEFRONT_SIZE` is not set on host compilation so the
verification
happens to work.
2024-06-03 11:18:26 -04:00
Shilei Tian
b448efb8ea
Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)" (#94139) 2024-06-03 11:17:36 -04:00
Joseph Huber
e19565c5c4
[Offload][AMDGPU] Only allow memory pool access to valid agents (#93969)
Summary:
The logic since the next-gen plugins was added was that every single
agent would get access to a memory pool we allocated. This is necessary
for things like fine-grained memory and to faciliate d2d copied.
However, there are cases where an agent cannot legally access a memory
pool. We have a debug check for this, but it would always be triggered
in these situations because both uses of the function simply passed
every agent. This patch changes the behavior by only enabling memory
pool access for agents that can access the memory pool.
2024-05-31 13:34:40 -05:00
Krzysztof Parzyszek
adc4e45f2e [Offload] Update test to use target parallel for reduction
Re-enable test disabled in 1bf1f93d with a fix.
2024-05-30 09:17:17 -05:00
Krzysztof Parzyszek
1bf1f93d94 [Offload] Temporarily disable failing test after eb88e7c1
The `target reduction` combination is no longer accepted.
Disable the test to avoid build failures, until a better fix is ready.
2024-05-30 08:52:29 -05:00
Joseph Huber
f284af4863 [Offload][Fix] Fix lazy initialization with multiple images
Summary:
There was a bug here where we would initialize the plugin multiple times
when there were multiple images. Fix it by putting the `is_initliaized`
check later.
2024-05-28 10:51:53 -05:00
Shilei Tian
cf9eeb67e5 Revert "Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)""
This reverts commit 7b4865582299294455bc816358fd88a9c6e5e0be.
2024-05-26 01:04:39 -04:00
Shilei Tian
7b48655822 Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)"
This reverts commit 9b31cc71d66064dfaf2afabf4a835211321bb4a0.
2024-05-26 00:57:50 -04:00
Michael Kruse
e14f5f225a
Revise IDE folder structure (#89755)
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode
(`set_property(TARGET <target> PROPERTY FOLDER "<title>")`)
when using the respective CMake's IDE generator.

 * Ensure that every target is in a folder
 * Use a folder hierarchy with each LLVM subproject as a top-level folder
 * Use consistent folder names between subprojects
 * When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
2024-05-25 17:41:21 +02:00
Joseph Huber
9b31cc71d6 Revert "[OpenMP][OMPX] Add shfl_down_sync (#93311)"
This reverts commit 098c6dfa8157681699a71fce9e3d94515e66311f.
This reverts commit 8c718a3a91df4ab68dc3f1ca3887ea730c9aed84.
This reverts commit 4fb02de9d490d0773441aa30124bb4d1272230d3.
2024-05-24 19:07:53 -05:00
Shilei Tian
4fb02de9d4
[OpenMP][OMPX] Add shfl_down_sync (#93311) 2024-05-24 14:00:43 -04:00
Shilei Tian
7eeec8e6d1
[OpenMP][OMPX] Add ballot_sync (#91297)
This patch adds the support for `ballot_sync` in ompx.
2024-05-24 09:54:54 -04:00
Joseph Huber
21f3a6091f
[Offload] Only initialize a plugin if it is needed (#92765)
Summary:
Initializing the plugins requires initializing the runtime like CUDA or
HSA. This has a considerable overhead on most platforms, so we should
only actually initialize a plugin if it is needed by any image that is
loaded.
2024-05-23 09:36:47 -05:00
Joseph Huber
300e5b9114
[Offload] Fix enabling plugins on unsupported platforms (#93186)
Summary:
Certain plugins can only be built on specific platforms. Previously this
didn't cause issues becaues each one was handled independently. However,
now that we link these all directly they need to be in a CMake list.
Furthermore we use this list to generate a config file. For this reason
these checks are moved to where we normalize the support.

Fixes: https://github.com/llvm/llvm-project/issues/93183
2024-05-23 08:06:41 -05:00
Jan Patrick Lehr
27f53b266c
[Offload] Disable flaky test on host-offloading (#93174)
While we investigate the issue, we disable the test on host-offloading
so the buildbots are back to more useful state.
Issue is tracked: https://github.com/llvm/llvm-project/issues/93173
2024-05-23 13:21:40 +02:00
Anchu Rajendran S
1a2a0c0dc9
Fixing the location attribute added to mapInfoOp (#90764)
Named location attribute added to `tgt_offload_entry` shall be used by
runtime calls like `ompx_dump_mapping_tables` to print the information
of variables that are mapped to the device. `ompx_dump_mapping_tables`
was printing the wrong location information and this change fixes it.

A sample execution of example before the change:
```
omptarget device 0 info: OpenMP Host-Device pointer mappings after block at libomptarget:0:0:

omptarget device 0 info: Host Ptr           Target Ptr         Size (B) DynRefCount HoldRefCount Declaration

omptarget device 0 info: 0x0000000000206df0 0x00007f02cdc00000 20000000 1           0            <program-file-loc> at unknown:18:35
```

The change replaces unknown to the mapped symbol and location to the
declaration location.
2024-05-23 13:46:35 +05:30
Joseph Huber
c618ae1734
[Offload] Rework handling for loading vendor runtimes (#93073)
Summary:
We previously had multiple options for this, this patch replaces them
with `LIBOMPTARGET_DLOPEN_PLUGINS=` to be a list of plugins to
dynamically use. It defaults to everything right now. This ignores the
`host` plugin because the `libffi` dependency is going to be removed
soon hopefully in https://github.com/llvm/llvm-project/pull/91264.
2024-05-22 13:04:52 -05:00
Joseph Huber
dbfedc6b27
[Offload] Use newer CUDA API functions when dynamically loaded (#93057)
Summary:
CUDA does its versioning by putting a redirection in the header so the
API functions remain the same while the symbol changes. These weren't
being used for some functions that required it in the dynamic cuda
version.

These functions have newer verisons that should be used. These are
fairly old as far as I'm aware so we should be able to sweep backward
compatibility under the rug.
2024-05-22 10:59:56 -05:00
Ye Luo
831d143519
[Offload] libomptarget force dlopen vendor libraries by default. (#92788)
Since #87009, libomptarget directly links all the plugins statically.
All the dependencies of plugins got exposed to libomptarget. The CUDA
plugin depends on libcuda and the amdgpu plugin depends on libhsa if not
forced using dlopen. On a cluster with different compute node
architectures, libomptarget can be built and run on different nodes. In
the build stage, if cmake founds libcuda and
`LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA=OFF`, libomptarget links libcuda.so
directly and the result libomptarget may not run a node without a NVIDIA
driver for example a CPU or AMD GPU only machine with a complaint that
libcuda.so not found.

The solution is setting `LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA` and
`LIBOMPTARGET_FORCE_DLOPEN_LIBHSA` `ON`. Preferably this should be
default to maximize the usability of libomptarget. If cmake detects
NVIDIA or AMD software on an OS imaging building node, the resulted
libomptarget may not be able to function on the user side due to the
requirement the existence of vendor runtime libraries.
2024-05-22 09:40:43 -05:00
Joseph Huber
3df7cb9ab9 [Offload] Remove unused version script for plugins
Summary:
The plugins are no longer linked to a share library, making this unused
and useless.
2024-05-20 10:06:11 -05:00
Joseph Huber
770d928303
[Offload][NFC] Remove 'libomptarget' message helpers (#92581)
Summary:
This isn't `libomptarget` anymore, and these messages were always
unnecessary because no other project uses these prefixed messages. The
effect of this is that no longer will the logs have `LIBOMPTARGET --` in
front of everything. We have a message stating when we start building
the offload project so it'll still be trivial to find.
2024-05-17 13:24:32 -05:00
Joseph Huber
16bb7e89a9
[Offload][NFC] Remove all trailing whitespace from offload/ (#92578)
Summary:
This patch cleans up the training whitespace in a bunch of tests and
CMake files. Most just in preparation for other cleanups.
2024-05-17 13:15:04 -05:00
Joseph Huber
c4017cda00
[Offload][NFC] Remove header license in CMake files (#92544)
Summary:
No other project has these in the CMake itself, and they're wildly
inconsistent even within the project. These don't really add anything so
I think they should be removed.
2024-05-17 09:05:03 -05:00
Joseph Huber
6d2219acab
[Libomptarget] Pass '-Werror=global-constructors' to the libomptarget build (#88531)
Summary:
A runtime library should not have global constructors. Everything is now
expected to go through the init methods. This patch ensures that global
constructors will not accidentally be introduced.
2024-05-16 16:06:20 -05:00
Joseph Huber
f42f57b52d
[Libomptarget] Rework Record & Replay to be a plugin member (#88928) (#89097)
Summary:
Previously, the R&R support was global state initialized by a global
constructor. This is bad because it prevents us from adequately
constraining the lifetime of the library. Additionally, we want to
minimize the amount of global state floating around.

This patch moves the R&R support into a plugin member like everything
else. This means there will be multiple copies of the R&R implementation
floating around, but this was already the case given the fact that we
currently handle everything with dynamic libraries.
2024-05-16 14:58:46 -05:00
Joseph Huber
033fa81480 [Offload][NFC] Remove unused files following static plugins
Summary:
Forgot to remove these when I landed the initial patch, they are no
longer used.
2024-05-16 11:48:34 -05:00
Joseph Huber
3abd3d6e59
[Libomptarget] Remove requires information from plugin (#80345)
Summary:
Currently this is only used for the zero-copy handling. However, this
can easily be moved into `libomptarget` so that we do not need to bother
setting the requires flags in the plugin. The advantage here is that we
no longer need to do this for every device redundently. Additionally,
these requires flags are specifically OpenMP related, so they should
live in `libomptarget`.
2024-05-16 11:13:50 -05:00
Joseph Huber
81d20d861e [Offload][NFC] Fix warning messages in runtime
Summary:
These are lots of random warnings due to inconsistent initialization or
signedness.
2024-05-15 15:30:38 -05:00
Joseph Huber
332de4b267
[Offload] Correctly reject building on unsupported architectures (#92276)
Summary:
Previously we had this `LIBOMPTARGET_ENABLED` variable which controlled
including `libomptarget`. This is now redundant since it's controlled by
`LLVM_ENABLE_RUNTIMES`. However, this had the extra effect of not
building it when given unsupported targets. THis was lost during the
move to `offload`. This patch moves this logic back and makes the
`offload` target just quit without doing anything if used on an
unsupported architecture.

https://github.com/llvm/llvm-project/issues/91881
https://github.com/llvm/llvm-project/issues/91819

---------

Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
2024-05-15 11:38:41 -05:00
jyu2-git
8e00703be9
[Clang][OpenMP] Fix runtime problem when explicit map both pointer and pointee (#92210)
ponter int *p for following map, test currently crash.

  map(p, p[:100]) or map(p, p[1])

Currly IR looks like
// &p, &p, sizeof(int), TARGET_PARAM | TO | FROM
// &p, p[0], 100sizeof(float) TO | FROM

Worrking IR is
// map(p, p[0:100]) to map(p[0:100])
// &p, &p[0], 100*sizeof(float), TARGET_PARAM | TO | FROM | PTR_AND_OBJ

The change is add new argument AreBothBasePtrAndPteeMapped in
generateInfoForComponentList

Use that to skip map for map(p), when processing map(p[:100]) generate
map with right flag.
2024-05-15 08:20:25 -07:00