2642 Commits

Author SHA1 Message Date
Joseph Huber
6dd84983d0 [Libomptarget] Improve next-gen AMDGPU plugin error messages
The next-gen plugin properly prints errors. This patch improves the
error messages by including the Node-ID of the GPU that failed as well
as a textual representation of the enumeration values.

Reviewed By: kevinsala

Differential Revision: https://reviews.llvm.org/D143192
2023-02-02 12:55:53 -06:00
Joseph Huber
48560e264c [Libomptarget] Fix the NVPTX Libomptarget test
Summary:
This was broken, we weren't adding these for the NVPTX tests.
2023-02-02 09:46:10 -06:00
Joseph Huber
1bde4ccae6 [Libomptarget] Fix building AMDGPU tests
Summary:
Accidentally deleted this.
2023-01-30 17:56:48 -06:00
Shilei Tian
516ae48170 [OpenMP][NVPTX] Guard the target name macro definition 2023-01-30 14:02:22 -05:00
Joseph Huber
292eca41d9 [Libomptarget] Fix tests after previous patch
Summary:
The previous patch didn't remove these tests correctly.
2023-01-30 07:18:51 -06:00
Joseph Huber
9b1d0ee10c [Libomptarget] Remove unused test targets in libomptaget
Summary:
These don't need to be set.
2023-01-30 06:34:15 -06:00
Shilei Tian
ad95b0e977 [OpenMP][NVPTX] Added __tgt_rtl_launch_kernel in old CUDA plugin
Fix #60248.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D142819
2023-01-28 18:56:07 -05:00
Shilei Tian
544f8c7f39 [OpenMP] Fix stack overflow for test bug54082.c
When `N` is 1024, `int result[N][N]` is obviously large stack that Windows cannot support...

Fix #60326.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142684
2023-01-26 23:45:11 -05:00
Joachim Protze
488d17154b Re-apply "[OpenMP][Archer] Use dlsym rather than weak symbols for TSan annotations"
Explicitly link libdl this time.

Differential Revision: https://reviews.llvm.org/D142378
2023-01-26 15:32:23 +01:00
Joseph Huber
21b1d55c04 [Libomptarget] Add correct relative path for the nexgen plugin
Summary:
I forgot that this file "borrowed" the source from the other file tree.
Fix that.
2023-01-25 14:05:53 -06:00
Joseph Huber
84d0243d21 [Libomptarget] Clean up CUDA plugin CMake files
Clean up this file after changing it in D142568.

Depends on D142568

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D142573
2023-01-25 13:58:02 -06:00
Joseph Huber
c568622046 [Libomptarget] Remove find_package(CUDA) as it has been deprecated
Since D137724 and the LLVM 17 release we have updated to CMake version
3.20. This means that `find_package(CUDA)` is officially deprecated and
can be replaced with `find_package(CUDAToolkit)` instead. This patch
does this and also cleans up a bit of the CMake.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D142568
2023-01-25 13:58:01 -06:00
Tom Stellard
603c286334 Bump the trunk major version to 17 2023-01-24 22:57:27 -08:00
Shilei Tian
5ba8ecb6cc [Clang][OpenMP] Find the type omp_allocator_handle_t from identifier table
In Clang, in order to determine the type of `omp_allocator_handle_t`, Clang
checks the type of those predefined allocators. The first one it checks is
`omp_null_allocator`. If the language is C, and the system is 64-bit, what Clang
gets is a `int`, instead of an enum of size 8, given the fact how we define
`omp_allocator_handle_t` in `omp.h`.  If the allocator is captured by a region,
let's say a parallel region, the allocator will be privatized. Because Clang deems
`omp_allocator_handle_t` as an `int`, it will first cast the value returned by
the runtime library (for `libomp` it is a `void *`) to `int`, and then in the
outlined function, it casts back to `omp_allocator_handle_t`. This two casts
completely shaves the first 32-bit of the pointer value returned from `libomp`,
and when the private "new" pointer is fed to another runtime function
`__kmpc_allocate()`, it causes segment fault. That is the root cause of PR54082.
I have no idea why `-fno-pic` could hide this bug.

In this patch, we detect `omp_allocator_handle_t` using roughly the same method
as `omp_event_handle_t`, by looking it up into the identifier table.

Fix #54082.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D142297
2023-01-24 22:49:05 -05:00
Shilei Tian
dafebd5b5a [OpenMP] Create a temp file in /tmp if /dev/shm is not accessible
When `libomp` is initialized, it creates a temp file in `/dev/shm` to store
registration flag. Some systems, like Android, don't have `/dev/shm`, then this
feature is disabled by the macro `KMP_USE_SHM`, though most Linux distributions
have that. However, some customized distribution, such as the one reported in
https://github.com/llvm/llvm-project/issues/53955, doesn't support it either.
It causes a core dump. In this patch, if it is the case, we will try to create a
temporary file in `/tmp`, and if it still doesn't make it, then we error out.
Note that we don't consider in this patch if the temporary directory has been
set to `TMPDIR` in this patch. If `/tmp` is not accessible, we error out.

Fix #53955.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142175
2023-01-24 21:45:38 -05:00
Kevin Sala
2a539ee17d [OpenMP][libomptarget] Implement memory lock/unlock API in NextGen plugins
This patch implements the memory lock/unlock API, introduced in patch https://reviews.llvm.org/D139208,
in the NextGen plugins. Locked buffers feature reference counting and we allow certain overlapping. Given
an already locked buffer A, other buffers that are fully contained inside A can be locked again, even if
they are smaller than A. In this case, the reference count of locked buffer A will be incremented. However,
extending an existing locked buffer is not allowed. The original buffer is actually unlocked once all its
users have released the locked buffer and sub-buffers (i.e., the reference counter becomes zero).

Differential Revision: https://reviews.llvm.org/D141227
2023-01-25 00:11:38 +01:00
Joseph Huber
5d1dc9fa04 [OpenMP] Do not link the bitcode OpenMP runtime when targeting AMDGPU.
The AMDGPU target can only emit LLVM-IR, so we can always rely on LTO to
link the static version of the runtime optimally. Using the static
library only has a few advantages. Namely, it avoids several known bugs
and allows us to optimize out more functions. This is legal since the
changes in D142486 and D142484

Depends on D142486 D142484

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142491
2023-01-24 17:01:37 -06:00
Giorgis Georgakoudis
4b88bf5c70 [OpenMP][docs] Update for record-and-replay
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142492
2023-01-24 14:36:37 -08:00
Shilei Tian
7e89420116 [OpenMP] Disable tests that are not supported by GCC if it is used for testing
GCC doesn't support `-fopenmp-version`, causing test failure if the compiler used
for testing is GCC.

GCC's OpenMP 5.2 support is very limited yet. Disable those tests requiring 5.2
feature for GCC as well.

We might want to take a look at all `libomp` tests and mark those tests that
don't support GCC yet.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D142173
2023-01-24 17:00:15 -05:00
Johannes Doerfert
62bc222875 [OpenMP][NFC] Augment release notes 2023-01-24 13:23:15 -08:00
Kevin Sala
9dea83d4af [OpenMP][Doc] Update release notes with NextGen plugins 2023-01-24 22:15:49 +01:00
Guilherme Valarini
7cf63ee80c [OpenMP][Docs] Add non-blocking target nowait environment variables 2023-01-24 16:30:34 -03:00
Shilei Tian
31c95e5a4d [OpenMP][Doc] Update release note for 16 release 2023-01-24 14:04:28 -05:00
Joseph Huber
c9c5a076b3 [OpenMP][Docs] Add some release notes for OpenMP 2023-01-24 12:35:58 -06:00
Slava Zakharin
8743e1e369 Revert "[OpenMP][Archer] Use dlsym rather than weak symbols for TSan annotations"
OpenMP buildbots are failing:
https://lab.llvm.org/buildbot/#/builders/193/builds/25434
https://lab.llvm.org/buildbot/#/builders/193/builds/25420

This reverts commit 7fbf12210007a66f7b62beadc0e5a52561cc0ab3.
2023-01-24 10:17:35 -08:00
Joachim Protze
7fbf122100 [OpenMP][Archer] Use dlsym rather than weak symbols for TSan annotations
This patch fix issues reported for Ubuntu and possibly other platforms:
https://github.com/llvm/llvm-project/issues/45290

The latest comment on this issue points out that using dlsym rather than
the weak symbol approach to call TSan annotation functions fixes the issue
for Ubuntu.

Differential Revision: https://reviews.llvm.org/D142378
2023-01-24 15:14:51 +01:00
Johannes Doerfert
5d9cb20f40 [OpenMP] Run the Attributor as part of the device runtime optimization
This will help us propagate assumptions to call sites, among other
things.
2023-01-23 22:45:47 -08:00
Joseph Huber
2a8c9d7c8a [Libomptarget] Use the nextgen plugins by default.
The next-gen plugins are complete drop-in replacements for the old
versions. We should strive to replace the old ones as quickly as
possible now that we have a viable alternative.

The only test failing is the `prelock.cpp` test as the support has not landed in
the next-gen plugins.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D142399
2023-01-23 17:30:46 -06:00
Scott Linder
25c0ea2a53 [NFC] Consolidate llvm::CodeGenOpt::Level handling
Add free functions llvm::CodeGenOpt::{getLevel,getID,parseLevel} to
provide common implementations for functionality that has been
duplicated in many places across the codebase.

Differential Revision: https://reviews.llvm.org/D141968
2023-01-23 22:50:49 +00:00
Martin Storsjö
c3737a6522 [docs] Add release notes for news in 16.x done by me, or otherwise relating to MinGW targets
Differential Revision: https://reviews.llvm.org/D142346
2023-01-23 22:12:32 +02:00
Joseph Huber
b280e12a3d [Libomptarget][NFC] Address a few warnings in libomptarget
Summary:
Fix a few minor warnings that show up in `libomptarget`.
2023-01-23 08:56:03 -06:00
Joseph Huber
716bae0b48 [Libomptarget] Include "hsa/hsa.h" instead
Summary:
Recently AMD moved the "hsa.h" include to "hsa/hsa.h". This causes
several warning. This patch checks to see if we can include that one
instead. This should hopefully keep things backwards compatible while
silencing the warnings.
2023-01-23 08:56:03 -06:00
Joseph Huber
11908c20cd [Libomptarget][NFC] Silence unknown CUDA version warnings
Summary:
These warnings are very loud considering they get repeated at least 30
times each build. This patch just silences them.
2023-01-23 08:56:03 -06:00
Shilei Tian
693358d787 [OpenMP][DeviceRTL][NFC] Use OMPTgtExecModeFlags from llvm/include/llvm/Frontend/OpenMP/OMPDeviceConstants.h
This patch makes preparation for a series that will enable per-kernel information
used in both host and device runtime. Some variables/enums, such as `OMPTgtExecModeFlags`,
have to be shared by both of them. A new header `OMPDeviceConstants.h` is added,
containing code that will be shared by them. We will introduce more variables soon.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142320
2023-01-22 19:10:54 -05:00
Johannes Doerfert
e68313f100 [OpenMP][FIX] Use thread id not team id for masked section 2023-01-22 15:45:00 -08:00
Johannes Doerfert
c175c07d90 [OpenMP][FIX] Split test into amdgpu and nvptx specific ones
This avoids running the test for the host.
2023-01-21 20:12:04 -08:00
Johannes Doerfert
40f9bf082f [OpenMP] Introduce the ompx_dyn_cgroup_mem(<N>) clause
Dynamic memory allows users to allocate fast shared memory when a kernel
is launched. We support a single size for all kernels via the
`LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable but now we can
control it per kernel invocation, hence allow computed values.

Note: Only the nextgen plugins will allocate memory based on the clause,
      the old plugins will silently miscompile.

Differential Revision: https://reviews.llvm.org/D141233
2023-01-21 18:46:36 -08:00
Johannes Doerfert
3820d0eaaf [OpenMP][FIX] Runtime args are not kernel args
Clang passes `KernelArgs.NumArgs` to the runtime but not all are kernel
arguments. This ensures we fallback to the old logic. In a follow up we
should introduce a new `KernelArgs.NumKernelArgs` field and set it in
the runtime.
2023-01-21 13:43:10 -08:00
Johannes Doerfert
16a385ba21 [OpenMP] Modernize the kernel launching interface and APIs
We already created a versioned `__tgt_kernel_arguments` struct but it
was only briefly used and its content was passed in isolation anyway.
This makes it hard to add more information in the future. With this
patch we fully embrace the struct as means to pass information from the
compiler to the plugin as part of a kernel launch.

The patch also extends and renames the struct, bumping the version
number to 2. Version 1 entries are auto-upgraded. This is in preparation
for "bare" kernel launches, per kernel dynamic shared memory, CUDA/HIP
lowering, etc.

The `__tgt_target_kernel_nowait` interface was deprecated as it was
unused. Once we actually implement support for something like that, we
can add an appropriate API.

Note: Only plugins with the `launch_kernel` interface are now supported.
      That means that a new clang won't be able to use an old runtime.
      An old clang can still use the new runtime since the libomptarget
      interface did not change.

Differential Revision: https://reviews.llvm.org/D141232
2023-01-21 11:16:21 -08:00
Jon Chesterfield
2257e3d2e5 [openmp] Workaround for HSA in issue 60119
Move plugin initialization to libomptarget initialization.

Removes the call_once control, probably fractionally faster overall.
Fixes issue 60119 because the plugin initialization, which might
try to dlopen unrelated shared libraries, is no longer nested within
a call from application code.

Fixes #60119

Reviewed By: Maetveis, jhuber6

Differential Revision: https://reviews.llvm.org/D142249
2023-01-21 12:01:14 +00:00
Joseph Huber
c7af1d19f3 [OpenMP] Remove unfinished and unused 'Analyzer' tool
Summary:
This patch removes a tool that was never finished and has no plans of
being picked up again. It does not need to live in LLVM source in an
unusable state.
2023-01-20 17:34:26 -06:00
Terry Wilmarth
4c58e5a28f [OpenMP] Fix for distributed barrier.
Distributed barrier was found to cause hangs in some test cases. Found
that a section updating the barrier size was improperly shifted to a
different code section during patching.  Restored to original
location, all tests run to completion.

Differential Revision: https://reviews.llvm.org/D141618
2023-01-20 13:54:25 -06:00
Shilei Tian
50d2a193a7 [OpenMP] Only test kmp_atomic_float10_max_min.c on X86
The test `openmp/runtime/test/atomic/kmp_atomic_float10_max_min.c` uses a compiler
flag `-mlong-double-80` that might not be supported by all targets. Currently it
requires `x86-registered-target`, but that requirement can be true when LLVM
supports X86 while the actual `libomp` arch is not X86. For example, when LLVM
is built on AArch64 with all targets enabled, `x86-registered-target` can be met.
If `libomp` is built with native target, aka. AArch64, the test will still be enabled,
causing test failure.

This patch only enables the test if the actual target is X86. The actual target
is determined by `LIBOMP_ARCH`.

Fix #53696.

Reviewed By: jlpeyton

Differential Revision: https://reviews.llvm.org/D142172
2023-01-20 10:52:53 -05:00
Kevin Sala
097f42602d [OpenMP][libomptarget] Fix deinit of NextGen AMDGPU plugin
This patch fixes a segfault that was appearing when the plugin fails to
initialize and then is deinitialized. Also, do not call hsa_shut_down if
the hsa_init failed.

Differential Revision: https://reviews.llvm.org/D142145
2023-01-20 13:17:32 +01:00
Nikita Popov
1b4fdf18bc [libomp] Explicitly include <string> header (NFC)
This is required to build against libstdc++ 13. Debug.h uses
std::stoi() from <string> without explicitly including it.
2023-01-20 10:39:27 +01:00
Ye Luo
9fecd58e5e [OpenMP] Build device runtimes for sm_89 and sm_90 2023-01-19 15:39:05 -06:00
Gilles Gouaillardet
3a362a9f38 [OpenMP][libomp] Insert correct HWLOC version guards
Put needed HWLOC version guards around relevant HWLOC API.
Tested OpenMP host runtime build with HWLOC 1.11.13, 2.0-2.9.

Differential Revision: https://reviews.llvm.org/D142152
Fix #54951
2023-01-19 14:30:43 -06:00
Guilherme Valarini
e0b3b6cec7 [OpenMP][Fix] Track all threads that may delete an entry
The entries inside a "target data end" is processed in three steps:

  1. Query internal data maps for the entries and dispatch any necessary
     device-side operations (i.e., data retrieval);
  2. Synchronize the such operations;
  3. Update the host-side pointers and remove any entry which reference
     counter reached zero.

Such steps may be executed by multiple threads which may even operate on
the same entries. The current implementation (D121058) tries to
synchronize these threads by tracking the "owner" for the deletion of
each entry using their thread ID. Unfortunately it may failed to do so
because of the following reasons:

  1. The owner is always assigned at the first step only if the
     reference count is 0 when the map is queried. This does not work
     when such owner thread is faster than a previous one that is also
     processing the same entry on another "target data end", leading to
     user-after-free problems.
  2. The entry is only added for post-processing (step 3) if its
     reference count was 0 at query time (step 1). This does not allow
     for threads to exchange responsibility for the deletion, leading
     again to user-after-free problems.
  3. An entry may appear multiple times in the arguments array of a
     "target data end", which may lead to deleting the entry
     prematurely, leading, again, to user-after-free problems.

This patch addresses these problems by tracking all the threads that are
using an entry at "target data end" region through a counter, ensuring
only the last one deletes it when needed. It also ensures that all
entries that are successfully found inside the data maps in step 1 are
also processed in step 3, regardless if their reference count was zeroed
or not at query time. This ensures the deletion ownership may be passed
to any thread that is using such entry.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D132676
2023-01-19 12:11:52 -03:00
Shilei Tian
97ae7d83e3 [OpenMP][OMPT] Expect failure from tool_available_search.c on macOS
D91464 introduced verbose tool loading, but the test check only considers Linux.
On macOS, the outputs are totally different, causing the regression afterwards.
This patch simply sets the test to XFAIL on macOS.

Fix #56833.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142045
2023-01-18 20:09:06 -05:00
Shilei Tian
3ff1726cf8 [OpenMP][AMDGPU] Get rid of redundant macro def
The next gen plugin adds the def of `DEBUG_PREFIX` in CMake, causing
compiler warning that `DEBUG_PREFIX` is defined multiple times. This patch simply
guards the macro def.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142064
2023-01-18 20:08:18 -05:00