3457 Commits

Author SHA1 Message Date
Jonathan Peyton
5300a6731e
[OpenMP] Fix re-locking hang found in issue 86684 (#88539)
This was initially reported here (including stacktraces):
https://stackoverflow.com/questions/78183545/does-compiling-imagick-with-openmp-enabled-in-freebsd-13-2-cause-sched-yield

If `__kmp_register_library_startup()` detects that another instance of
the library is present, `__kmp_is_address_mapped()` is eventually
called. which uses `kmpc_alloc()` to allocate memory. This function
calls `__kmp_entry_thread()` to access the thread-local memory pool,
which is a bad idea during initialization. This macro internally calls
`__kmp_get_global_thread_id_reg()` which sets the bootstrap lock at the
beginning (before calling `__kmp_register_library_startup()`).

The fix is to use `KMP_INTERNAL_MALLOC()`/`KMP_INTERNAL_FREE()` instead
of `kmpc_malloc()`/`kmpc_free()`. `KMP_INTERNAL_MALLOC` and
`KMP_INTERNAL_FREE` do not use any bootstrap locks. They just translate
to `malloc()`/`free()` and are meant to be used during library
initialization before other library-specific allocators have been
initialized.

Fixes: #86684
2024-04-12 15:13:59 -05:00
Xing Xue
b3792ae42a
[OpenMP][AIX] Fix test config for AIX (#88272)
This patch fixes the test config so that it works for
`tasking/omp50_taskdep_depobj.c` which uses different flags to test with
compiler's `omp.h`.
* set test environment variable `OBJECT_MODE` to `64` if it is set
explicitly to `64` in the AIX environment. `OBJECT_MODE` is default to
`32` and is recognized by AIX compilers and toolchain. In this way, we
don't need to set `-m64` for all compiler flags for 64-bit mode
* add option `-Wl,-bmaxdata` to 32-bit `test_openmp_flags` used by
`tasking/omp50_taskdep_depobj.c`
2024-04-10 16:06:31 -04:00
Joseph Huber
f81879c0f7 [Libomptarget] Add RPC-based printf implementation for OpenMP #85638
Summary:
Relanding after reverting, only applies to AMDGPU for now.

This patch adds an implementation of printf that's provided by the GPU
C library runtime. This pritnf currently implemented using the same
wrapper handling that OpenMP sets up. This will be removed once we have
proper varargs support.

This printf differs from the one CUDA offers in that it is synchronous
and uses a finite size. Additionally we support pretty much every
format specifier except the %n option.

Depends on #85331
2024-04-10 13:36:25 -05:00
Joseph Huber
a8f9f85ab0 [Libomptarget][NFC] Fix unused variable warnings
Summary:
This patch fixes a few warnings that would show up while building.
2024-04-10 10:01:15 -05:00
Joseph Huber
d022f6b8ff [Libomp] Place generated OpenMP headers into build resource directory (#88007)
Summary:
These headers are a part of the compiler's resource directory once
installed. However, they are currently placed in the binary directory
temporarily. This makes it more difficult to use the compiler out of the
build directory and will cause issues when moving to `liboffload`. This
patch changes the logic to write these instead to the copmiler's
resource directory inside of the build tree.

NOTE: This doesn't change the Fortran headers, I don't know enough about
those and it won't use the same directory.
2024-04-09 08:47:51 -05:00
Pete Steinfeld
25e3d2b0fc
Revert "[Libomp] Place generated OpenMP headers into build resource d… (#88083)
…irectory (#88007)"

This reverts commit 8671429151d5e67d3f21a737809953ae8bdfbfde.

This commit broke the flang build, so I'm reverting it. See the comments
in merge request #88007 for more information.
2024-04-08 20:20:27 -07:00
Joseph Huber
8671429151
[Libomp] Place generated OpenMP headers into build resource directory (#88007)
Summary:
These headers are a part of the compiler's resource directory once
installed. However, they are currently placed in the binary directory
temporarily. This makes it more difficult to use the compiler out of the
build directory and will cause issues when moving to `liboffload`. This
patch changes the logic to write these instead to the copmiler's
resource directory inside of the build tree.

NOTE: This doesn't change the Fortran headers, I don't know enough about
those and it won't use the same directory.
2024-04-08 15:26:54 -05:00
Jonathan Peyton
eeaaf33fc2
[OpenMP] Unsupport absolute KMP_HW_SUBSET test for s390x (#87555) 2024-04-04 13:54:40 -05:00
Jonathan Peyton
2ff3850ea1
[OpenMP] Add absolute KMP_HW_SUBSET functionality (#85326)
Users can put a : in front of KMP_HW_SUBSET to indicate that the
specified subset is an "absolute" subset. Currently, when a user puts
KMP_HW_SUBSET=1t. This gets translated to KMP_HW_SUBSET="*s,*c,1t",
where * means "use all of". If a user wants only one thread as the
entire topology they can now do KMP_HW_SUBSET=:1t.

Along with the absolute syntax is a fix for newer machines and making
them easier to use with only the 3-level topology syntax. When a user
puts KMP_HW_SUBSET=1s,4c,2t on a machine which actually has 4 layers,
(say 1s,2m,3c,2t as the entire machine) the user gets an unexpected "too
many resources asked" message because KMP_HW_SUBSET currently translates
the "4c" value to mean 4 cores per module. To help users out, the
runtime can assume that these newer layers, module in this case, should
be ignored if they are not specified, but the topology should always
take into account the sockets, cores, and threads layers.
2024-04-03 11:43:23 -05:00
Joseph Huber
943f39d29e Revert "[Libomptarget] Add RPC-based printf implementation for OpenMP (#85638)"
This reverts commit 2cf8118e3aa60f406ec41e88bdd4304f39744e89.

Failing tests, revert until I can fix it
2024-04-02 20:53:45 -05:00
Joseph Huber
2cf8118e3a
[Libomptarget] Add RPC-based printf implementation for OpenMP (#85638)
Summary:
This patch adds an implementation of `printf` that's provided by the GPU
C library runtime. This `pritnf` currently implemented using the same
wrapper handling that OpenMP sets up. This will be removed once we have
proper varargs support.

This `printf` differs from the one CUDA offers in that it is synchronous
and uses a finite size. Additionally we support pretty much every format
specifier except the `%n` option.

Depends on https://github.com/llvm/llvm-project/pull/85331
2024-04-02 16:36:36 -05:00
Jonathan Peyton
4ea24946e3
[OpenMP] Fix nested parallel with tasking (#87309)
When a nested parallel region ends, the runtime calls __kmp_join_call().
During this call, the primary thread of the nested parallel region will
reset its tid (retval of omp_get_thread_num()) to what it was in the
outer parallel region. A data race occurs with the current code when
another worker thread from the nested inner parallel region tries to
steal tasks from the primary thread's task deque. The worker thread
reads the tid value directly from the primary thread's data structure
and may read the wrong value.

This change just uses the calculated victim_tid from execute_tasks()
directly in the steal_task() routine rather than reading tid from the
data structure.

Fixes: #87307
2024-04-02 15:56:50 -05:00
nihui
31880df994
[OpenMP] get logical core count on modern apple platform (#87231)
`hw.logicalcpu` returns the available logical core count

Fix build error for watchOS

```
runtime/src/z_Linux_util.cpp:1821:8: error: 'host_info' is unavailable: not available on watchOS
  rc = host_info(mach_host_self(), HOST_BASIC_INFO, (host_info_t)&info, &num);
       ^
/Applications/Xcode_15.2.app/Contents/Developer/Platforms/WatchOS.platform/Developer/SDKs/WatchOS10.2.sdk/usr/include/mach/mach_host.h:82:15: note: 'host_info' has been explicitly marked unavailable here
kern_return_t host_info
              ^
1 warning and 1 error generated.
make[2]: *** [runtime/src/CMakeFiles/omp.dir/z_Linux_util.cpp.o] Error 1
```
2024-04-02 11:38:40 -04:00
nihui
c5bbdb6494
[OpenMP] arm64_32 port for Apple WatchOS (#87246)
detect `aarch64_32` with compiler defined macro `__ARM64_ARCH_8_32__`
reuse ARM `__kmp_unnamed_critical_addr` and add `KMP_PREFIX_UNDERSCORE`
macro like AARCH64
reuse AARCH64 `__kmp_invoke_microtask`


build log for watchos armv7k + arm64_32 and watchos simulator x86_64 +
arm64

https://github.com/nihui/action-protobuf/actions/runs/8520684611/job/23337305030
2024-04-02 11:38:32 -04:00
Joseph Huber
4213f4a9ae [Libomptarget] Fix resizing the buffer of RPC handles
Summary:
The previous code would potentially make it smaller if a device with a
lower ID touched it later. Also we should minimize changes to the state
for multi threaded reasons. This just sets up an owned slot for each at
initialization time.
2024-04-01 07:29:57 -05:00
Sitnikov Sergey
7467dc188a
Fix apostrophes in assert error message (#69881) 2024-03-29 18:43:01 -04:00
Jonathan Peyton
038e66fe59
[OpenMP] Have hidden helper team allocate new OS threads only (#87119)
The hidden helper team pre-allocates the gtid space [1,
num_hidden_helpers] (inclusive). If regular host threads are allocated,
then put back in the thread pool, then the hidden helper team is
initialized, the hidden helper team tries to allocate the threads from
the thread pool with gtids higher than [1, num_hidden_helpers]. Instead,
have the hidden helper team fork OS threads so the correct gtid range
used for hidden helper threads.

Fixes: #87117
2024-03-29 17:26:00 -05:00
dhruvachak
cc8c6b037c
[OpenMP] [amdgpu] Added a synchronous version of data exchange. (#87032)
Similar to H2D and D2H, use synchronous mode for large data transfers
beyond a certain size for D2D as well. As with H2D and D2H, this size is
controlled by an env-var.
2024-03-29 13:33:43 -07:00
Joseph Huber
a1a8bb1d3a
[libc] Change RPC interface to not use device ids (#87087)
Summary:
The current implementation of RPC tied everything to device IDs and
forced us to do init / shutdown to manage some global state. This turned
out to be a bad idea in situations where we want to track multiple
hetergeneous devices that may report the same device ID in the same
process.

This patch changes the interface to instead create an opaque handle to
the internal device and simply allocates it via `new`. The user will
then take this device and store it to interface with the attached
device. This interface puts the burden of tracking the device identifier
to mapped d evices onto the user, but in return heavily simplifies the
implementation.
2024-03-29 12:49:16 -05:00
Joseph Huber
ea707baca2
[Libomptarget][NFCI] Move logic out of PluginAdaptorTy (#86971)
Summary:
This patch removes most of the special handling from the
`PluginAdaptorTy` in preparation for changing this to be the
`GenericPluginTy`. Doing this requires that the OpenMP specific handling
of stuff like device offsets be contained within the OpenMP plugin
manager. Generally this was uninvasive expect for the change to tracking
the offset and size of the used devices. The eaiest way I could think to
do this was to use some maps, which double as indicators for which
plugins have devices active. This should not affect the logic.
2024-03-29 07:19:22 -05:00
Ulrich Weigand
b999e631c0
[OpenMP] Fix node destruction race in __kmpc_omp_taskwait_deps_51 (#86130)
The __kmpc_omp_taskwait_deps_51 allocates a kmp_depnode_t node on its
stack, and there is currently a race condition where another thread
might still be accessing that node after the function has returned and
its stack frame was released.

While the function does wait until the node's npredecessors count has
reached zero before exiting, there is still a window where the function
that last decremented the npredecessors count assumes the node is still
accessible.

For heap-allocated kmp_depnode_t nodes, this normally works via a
separate ndeps count that only reaches zero at the point where no
accesses to the node are expected at all; in fact, at this point the
heap allocation will be freed.

For this case of a stack-allocated kmp_depnode_t node, it therefore
makes sense to similarly respect the ndeps count; we need to wait until
this reaches 1 (not 0, because it is not heap-allocated so there's
always one extra count to prevent it from being freed), before we can
safely deallocate our stack frame.

As this is expected to be a short race window of only a few
instructions, it should be fine to just use a busy wait loop checking
the ndeps count.

Fixes: https://github.com/llvm/llvm-project/issues/85963
2024-03-28 12:15:39 +01:00
Ye Luo
19185d5a72
[Libomptarget] Make dynamic loading libffi more verbose. (#86891) 2024-03-27 18:40:57 -05:00
Joseph Huber
4a5056be2e
[Libomptarget][NFC] Remove trivially true checks on function pointers (#86804)
Summary:
Previously we had an interface that checked these functions pointers to
see if they are implemented by the plugin. This was removed as currently
every single function is implemented as a part of the common interface.
These checks are now always true and do nothing.
2024-03-27 14:37:55 -05:00
Joseph Huber
ed68aac9f2
[Libomptarget] Move API implementations into GenericPluginTy (#86683)
Summary:
The plan is to remove the entire plugin interface and simply use the
`GenericPluginTy` inside of `libomptarget` by statically linking against
it. This means that inside of `libomptarget` we will simply do
`Plugin.data_alloc` without the dynamically loaded interface. To reduce
the amount of code required, this patch simply moves all of the RTL
implementation functions inside of the Generic device. Now the
`__tgt_rtl_` interface is simply a shallow wrapper that will soon go
away. There is some redundancy here, this will be improved later. For
now what is important is minimizing the changes to the API.
2024-03-27 14:10:54 -05:00
Terry Wilmarth
aa2c14de1a
[OpenMP] Close up permissions on /tmp files (#85469)
The SHM or /tmp files that might be created during library registration
don't need to have such open permissions, so this change fixes that.
2024-03-27 11:27:28 -04:00
Shilei Tian
a7ac0dd624 [NFC][OpenMP] Use SimpleVLA to replace variable length arrays in C++ 2024-03-27 00:23:32 -04:00
Shilei Tian
fa9ee4a7f9 [NFC][OpenMP] Silent unused variable in kmp_collapse.cpp 2024-03-27 00:09:40 -04:00
Vadim Paretsky
7db4046322
[OpenMP] add loop collapse tests (#86243)
This PR adds loop collapse tests ported from MSVC.

---------

Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com>
2024-03-26 16:41:31 -07:00
Joseph Huber
4dc3225248
[Libomptarget] Replace global PluginTy::get interface with references (#86595)
Summary:
We have a plugin singleton that implements the Plugin interface. This
then spawns separate device and kernels. Previously when these needed to
reach into the global singleton they would use the `PluginTy::get`
routine to get access to it. In the future we will move away from this
as the lifetime of the plugin will be handled by `libomptarget`
directly. This patch removes uses of this inside of the plugin
implementaion themselves by simply keeping a reference to the plugin
inside of the device.

The external `__tgt_rtl` functions still use the global method, but will
be removed later.
2024-03-26 07:13:59 -05:00
Joseph Huber
2cad43c1ba
[Libomptarget] Factor functions out of 'Plugin' interface (#86528)
Summary:
This patch factors common functions out of the `Plugin` interface prior
to its removal in a future patch. This simply temporarily renames it to
`PluginTy` so that we could re-use `Plugin::check` internally as this
needs to be defined statically per plugin now. We can refactor this
later.

The future patch will delete `PluginTy` and `PluginTy::get` entirely.
This simply tries to minimize a few changes to make it easier to land.
2024-03-25 15:24:39 -05:00
Joseph Huber
9f0321ccf1 [Libomptarget] Make plugins depend explicitly on intrinsics_gen
Summary:
It's possible for the OpenMP offloading plugins to be build before
tablegen is run despite the fact that we rely on it. Simply make it
depend on it currently.
2024-03-24 15:24:35 -05:00
Joseph Huber
909ea28ac6 [Libomptarget] Specificall add LLVM include dirs in plugins 2024-03-24 14:29:39 -05:00
Michał Górny
3f5e649ff6
[Libomptarget] Fix linking to LLVM dylib (#86397)
Use `LINK_COMPONENTS` parameter of `add_llvm_library` rather than
passing LLVM components directly to `target_link_libraries`, in order to
ensure that LLVM dylib is linked correctly when used. Otherwise, CMake
insists on linking to static libraries that aren't present on
distributions doing pure dylib installs, such as Gentoo.

This fixes a regression introduced
in dcbddc25250158469c5635ad2ae4095faef53dfd.
2024-03-23 15:46:32 +00:00
Joseph Huber
20e0bacd05 [Libomptarget][Fix] Remove duplicate version script for host builds
Summary:
This causes an error on some linkers and was mistakenly kept in.
2024-03-22 19:09:16 -05:00
Joseph Huber
85af772f3b [Libomptarget][FIX] Fix unintentinally used PUBLIC interface
Summary:
This was supposed to be private and caused some issues with certain
configs.
2024-03-22 16:38:51 -05:00
Joseph Huber
dcbddc2525
[Libomptarget] Unify and simplify plugin CMake (#86191)
Summary:
This patch reworks the CMake handling for building plugins. All this
does is pull a lot of shared and common logic into a single helper
function.
This also simplifies the OMPT libraries from being built separately
instead of just added.
2024-03-22 16:13:58 -05:00
Xing Xue
d394f3a162
[OpenMP][AIX] Affinity implementation for AIX (#84984)
This patch implements `affinity` for AIX, which is quite different from
platforms such as Linux.
- Setting CPU affinity through masks and related functions are not
supported. System call `bindprocessor()` is used to bind a thread to one
CPU per call.
- There are no system routines to get the affinity info of a thread. The
implementation of `get_system_affinity()` for AIX gets the mask of all
available CPUs, to be used as the full mask only.
- Topology is not available from the file system. It is obtained through
system SRAD (Scheduler Resource Allocation Domain).

This patch has run through the libomp LIT tests successfully with
`affinity` enabled.
2024-03-22 15:25:08 -04:00
agozillon
8612fa0d84
[MLIR][OpenMP] Refactor bounds offsetting and fix to apply to all directives (#84349)
This PR refactors bounds offsetting by combining the two differing
implementations (one applying to initial derived type member map
implementation for descriptors and the other for regular arrays,
effectively allocatable array vs regular array in fortran) now that it's
a little simpler to do.

The PR also moves the utilization of createAlteredByCaptureMap into
genMapInfoOp, where it will be correctly applied to all MapInfoData,
appropriately offsetting and altering Pointer data set in the kernel
argument structure on the host. This primarily means bounds offsets will
now correctly apply to enter/exit/update map clauses as opposed to just
the Target directive that is currently the case. A few fortran runtime
tests have been added to verify this new behavior.

This PR depends on: https://github.com/llvm/llvm-project/pull/84328 and
is an extraction of the larger derived type member map PR stack (so a
requirement for it to land).
2024-03-22 15:32:39 +01:00
Ulrich Weigand
cb071942f8 [OpenMP] Fix SystemZ build failure
Commit a7d5f73a03c81cab8df64dbd099e8acb40f5dfe1 introduced an
error in a target_compile_definitions on the SystemZ, causing
the build to break.  Fixed by adding the missing "PRIVATE".
2024-03-21 12:02:50 +01:00
Joseph Huber
a7d5f73a03
[Libomptarget] Consolidate CPU offloading into 'host' directory (#86014)
Summary:
All of these CPU targets use the same underlying implementation. We
should consolidate them into a single target to make it easier to update
this to a static library based approach. I have decided to call this the
'host' target so it can be given a single name. We still only build
these if the system processor matches and we are on Linux.
2024-03-20 17:41:53 -05:00
Gheorghe-Teodor Bercea
c25e77436e
Revert "[libomptarget][nextgen-plugin] Use SCRELEASE/SCACQUIRE in packet headers" (#85950)
Reverts llvm/llvm-project#85678
2024-03-20 11:40:12 -04:00
Gheorghe-Teodor Bercea
927308a52b
[libomptarget][nextgen-plugin] Use SCRELEASE/SCACQUIRE in packet headers (#85678)
This patch updates the construction of packet headers to replace the
usage of ACQUIRE/RELEASE with SCACQUIRE/SCRELEASE which is now
recommended.
The patch also ensures consistency across kernel dispatches.
2024-03-20 11:22:01 -04:00
Michael Klemm
fb5fd2d82f
[flang][OpenMP] Compile proper omp_lib.mod from the openmp/src/include sources (#80874)
This PR changes the build system to use use the sources for the module
`omp_lib` and the `omp_lib.h` include file from the `openmp` runtime
project and not from a separate copy of these files. This will greatly
reduce potential for inconsistencies when adding features to the OpenMP
runtime implementation.

When the OpenMP subproject is not configured, this PR also disables the
corresponding LIT tests with a "REQUIRES" directive at the beginning of
the OpenMP test files.

---------

Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
2024-03-20 13:47:26 +01:00
Ulrich Weigand
fe1341248d [OpenMP] Disable workshare_chunk.c test case on SystemZ
The test added by https://github.com/llvm/llvm-project/pull/83261
has been consistently failing.  Mark as UNSUPPORTED just like on
x86_64 and aarch64.
2024-03-20 11:12:26 +01:00
dhruvachak
b5d02bbd0d
[OpenMP] Increment kernel args version, used by runtime for detecting dyn_ptr. (#85363)
A kernel implicit parameter (dyn_ptr) was introduced some time back.
This patch increments the kernel args version for a compiler supporting
dyn_ptr. The version will be used by the runtime to determine whether
the implicit parameter is generated by the compiler. The versioning is
required to support use cases where code generated by an older compiler
is linked with a newer runtime.

If approved, this patch should be backported to release 18.
2024-03-19 16:40:22 -07:00
Joseph Huber
5cccc405e3 Revert "[OpenMP] Disable flaky barrier fence test (#85093)"
This reverts commit cd8843f87af2f04a85dda12b37738596cbf4cd5e.

Originally disabled to try to unstick the AMD build bot, didn't make a
difference after a week so it goes back in.
2024-03-19 14:46:03 -05:00
Dominik Adamski
b9a41b9e9b
[NFC][OpenMP] Add test checking clang offload chunking policy (#83261)
Verify how clang handles `dist_schedule(static, block_chunk)` and
`schedule(static, thread_chunk)` clauses for OpenMP offload loop
workshare pragmas.
2024-03-19 10:29:35 -07:00
Brad Smith
c7de4a39d5
[OpenMP] Enable the affinity tests on FreeBSD, NetBSD and DragonFly (#85500)
FreeBSD, NetBSD and DragonFly also have affinity support. So enable the tests there as well.
2024-03-19 13:29:19 -04:00
Akash Banerjee
e9da5f0083
[OpenMP] Fix target data region codegen being omitted for device pass (#85218)
This patch enables the BodyCodeGen callback to still trigger for the
TargetData nested region during the device pass. There maybe Target code
nested within the TargetData region for which this is required.

Also add tests for the same.
2024-03-19 13:04:23 +00:00
nicebert
29849d589c
[OpenMP] Fix ompx_dump_mapping_tables lit test (#85754)
Fixes ompx_dump_mapping_tables test by only using one device after
breaking built bots
2024-03-19 10:26:41 +01:00