With the NPM, we're now defaulting to preserving LCSSA, so a couple
of tests have changed slightly.
Differential Revision: https://reviews.llvm.org/D140982
The th_task_state was initialized from the master thread's value, or
from its memo stack, but this causes problems because neither of those
may have the right value at the right time. However, other threads in
the team are guaranteed to have the right values, so we change the
initialize the new threads' th_task_state from the th_task_state of
the last of the older threads in the hot team.
Differential Revision: https://reviews.llvm.org/D142247Fix#56307.
The memory sanitizer intercepts the memcpy() call but not the direct
assignment of last byte to 0. This leads the sanitizer to believe the
last byte of a string based on the kmp_str_buf_t type is uninitialized.
Hence, the eventual strlen() inside __kmp_env_dump() leads to an
use-of-uninitialized-value warning.
Using strncat() instead gives the sanitizer the information it needs.
Differential Revision: https://reviews.llvm.org/D143401Fixes#60501
The memory sanitizer intercepts the memcpy() call but not the direct
assignment of last byte to 0. This leads the sanitizer to believe the
last byte of a string based on the kmp_str_buf_t type is uninitialized.
Hence, the eventual strlen() inside __kmp_env_dump() leads to an
use-of-uninitialized-value warning.
Using strncat() instead gives the sanitizer the information it needs.
Differential Revision: https://reviews.llvm.org/D143401Fixes#60501
The NextGen plugins use the information regarding new mapping/unmappings to
lock/unlock the corresponding host buffer and speed up the host-device memory
transfers involving those buffers. The locking/unlocking is disabled by default
and can be enabled by the LIBOMPTARGET_LOCK_MAPPED_HOST_BUFFERS envar. The
envar accepts boolean values (on/off) and a special option:
- off: Do not lock mapped host buffers (default).
- on: Lock mapped host buffers automatically, but do not report lock
failures if the plugin fails to lock them.
- mandatory: Lock mapped host buffers automatically and treat locking failures
in the plugins as fatal errors. This option may be useful for
debugging purposes.
Differential Revision: https://reviews.llvm.org/D142514
Summary:
The previous patch also needed to apply this to the other AMDGPU plugin,
this will be removed soon but it should be correct while it's here at
least.
Previously, on non-Linux, amdgpu would get enabled whatever the CPU architecture.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D143017
While we potentially need to align partially mapped structs more than
the first member, we do not need to align past the struct itself. This
prevents us from moving the base pointer past the struct beginning too.
See https://reviews.llvm.org/D142508 for a discussion.
Reviewed By: pavelkopyl, grokos, jhuber6
Differential Revision: https://reviews.llvm.org/D142586
`check_loc` is not used if ITT is disabled or debug is off, causing a
compiler warning.
Reviewed By: jlpeyton
Differential Revision: https://reviews.llvm.org/D143004
Summary:
We added a new agent information enum in a previous commit. This was not
added to the dynamic HSA implementation so it failed to compile without
a local HSA install to use.
The next-gen plugin properly prints errors. This patch improves the
error messages by including the Node-ID of the GPU that failed as well
as a textual representation of the enumeration values.
Reviewed By: kevinsala
Differential Revision: https://reviews.llvm.org/D143192
When `N` is 1024, `int result[N][N]` is obviously large stack that Windows cannot support...
Fix#60326.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142684
Clean up this file after changing it in D142568.
Depends on D142568
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D142573
Since D137724 and the LLVM 17 release we have updated to CMake version
3.20. This means that `find_package(CUDA)` is officially deprecated and
can be replaced with `find_package(CUDAToolkit)` instead. This patch
does this and also cleans up a bit of the CMake.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D142568
In Clang, in order to determine the type of `omp_allocator_handle_t`, Clang
checks the type of those predefined allocators. The first one it checks is
`omp_null_allocator`. If the language is C, and the system is 64-bit, what Clang
gets is a `int`, instead of an enum of size 8, given the fact how we define
`omp_allocator_handle_t` in `omp.h`. If the allocator is captured by a region,
let's say a parallel region, the allocator will be privatized. Because Clang deems
`omp_allocator_handle_t` as an `int`, it will first cast the value returned by
the runtime library (for `libomp` it is a `void *`) to `int`, and then in the
outlined function, it casts back to `omp_allocator_handle_t`. This two casts
completely shaves the first 32-bit of the pointer value returned from `libomp`,
and when the private "new" pointer is fed to another runtime function
`__kmpc_allocate()`, it causes segment fault. That is the root cause of PR54082.
I have no idea why `-fno-pic` could hide this bug.
In this patch, we detect `omp_allocator_handle_t` using roughly the same method
as `omp_event_handle_t`, by looking it up into the identifier table.
Fix#54082.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D142297
When `libomp` is initialized, it creates a temp file in `/dev/shm` to store
registration flag. Some systems, like Android, don't have `/dev/shm`, then this
feature is disabled by the macro `KMP_USE_SHM`, though most Linux distributions
have that. However, some customized distribution, such as the one reported in
https://github.com/llvm/llvm-project/issues/53955, doesn't support it either.
It causes a core dump. In this patch, if it is the case, we will try to create a
temporary file in `/tmp`, and if it still doesn't make it, then we error out.
Note that we don't consider in this patch if the temporary directory has been
set to `TMPDIR` in this patch. If `/tmp` is not accessible, we error out.
Fix#53955.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142175
This patch implements the memory lock/unlock API, introduced in patch https://reviews.llvm.org/D139208,
in the NextGen plugins. Locked buffers feature reference counting and we allow certain overlapping. Given
an already locked buffer A, other buffers that are fully contained inside A can be locked again, even if
they are smaller than A. In this case, the reference count of locked buffer A will be incremented. However,
extending an existing locked buffer is not allowed. The original buffer is actually unlocked once all its
users have released the locked buffer and sub-buffers (i.e., the reference counter becomes zero).
Differential Revision: https://reviews.llvm.org/D141227
The AMDGPU target can only emit LLVM-IR, so we can always rely on LTO to
link the static version of the runtime optimally. Using the static
library only has a few advantages. Namely, it avoids several known bugs
and allows us to optimize out more functions. This is legal since the
changes in D142486 and D142484
Depends on D142486 D142484
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142491
GCC doesn't support `-fopenmp-version`, causing test failure if the compiler used
for testing is GCC.
GCC's OpenMP 5.2 support is very limited yet. Disable those tests requiring 5.2
feature for GCC as well.
We might want to take a look at all `libomp` tests and mark those tests that
don't support GCC yet.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D142173
This patch fix issues reported for Ubuntu and possibly other platforms:
https://github.com/llvm/llvm-project/issues/45290
The latest comment on this issue points out that using dlsym rather than
the weak symbol approach to call TSan annotation functions fixes the issue
for Ubuntu.
Differential Revision: https://reviews.llvm.org/D142378
The next-gen plugins are complete drop-in replacements for the old
versions. We should strive to replace the old ones as quickly as
possible now that we have a viable alternative.
The only test failing is the `prelock.cpp` test as the support has not landed in
the next-gen plugins.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D142399
Add free functions llvm::CodeGenOpt::{getLevel,getID,parseLevel} to
provide common implementations for functionality that has been
duplicated in many places across the codebase.
Differential Revision: https://reviews.llvm.org/D141968
Summary:
Recently AMD moved the "hsa.h" include to "hsa/hsa.h". This causes
several warning. This patch checks to see if we can include that one
instead. This should hopefully keep things backwards compatible while
silencing the warnings.
This patch makes preparation for a series that will enable per-kernel information
used in both host and device runtime. Some variables/enums, such as `OMPTgtExecModeFlags`,
have to be shared by both of them. A new header `OMPDeviceConstants.h` is added,
containing code that will be shared by them. We will introduce more variables soon.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142320