llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-19 02:46:51 +00:00

Author	SHA1	Message	Date
Alex	021a1f6974	[OFFLOAD] Stricter enforcement of user offload disable (#133470 ) If user specifies offload is disabled (e.g., OMP_TARGET_OFFLOAD=disable), disable library almost completely. This reduces resources spent to a minimum and ensures all APIs behave as if the only available device is the host device. Currently some of the APIs behave as if there were devices avaible for offload even when under OMP_TARGET_OFFLOAD=disable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-03-28 17:28:14 -05:00
Jon Chesterfield	deb0f3c09b	[openmp][nfc] Use builtin align in the devicertl (#131918 ) Noticed while extracting the smartstack as a test case	2025-03-18 21:31:49 +00:00
Ethan Luis McDonough	9e5c136d5a	[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365 ) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set	2025-02-11 23:30:54 -06:00
Christian Clauss	1f56bb3137	[Offload][NFC] Fix typos discovered by codespell (#125119 ) https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`	2025-01-31 09:35:29 -06:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Joseph Huber	6518b121f0	[Offload][NFC] Factor out and rename the `__tgt_offload_entry` struct (#123785 ) Summary: This patch is an NFC renaming to make using the offloading entry type more portable between other targets. Right now this is just moving its definition to LLVM so others can use it. Future work will rework the struct layout.	2025-01-21 12:05:24 -06:00
Joseph Huber	f53cb84df6	[OpenMP] Use __builtin_bit_cast instead of UB type punning (#122325 ) Summary: Use a normal bitcast, remove from the shared utils since it's not available in GCC 7.4	2025-01-09 13:59:21 -06:00
Joseph Huber	91f5f974cb	[OpenMP] Unconditionally provide an RPC client interface for OpenMP (#117933 ) Summary: This patch adds an RPC interface that lives directly in the OpenMP device runtime. This allows OpenMP to implement custom opcodes. Currently this is only providing the host call interface, which is the raw version of reverse offloading. Previously this lived in `libc/` as an extension which is not the correct place. The interface here uses a weak symbol for the RPC client by the same name that the `libc` interface uses. This means that it will defer to the libc one if both are present so we don't need to set up multiple instances. The presense of this symbol is what controls whether or not we set up the RPC server. Because this is an external symbol it normally won't be optimized out, so there's a special pass in OpenMPOpt that deletes this symbol if it is unused during linking. That means at `O0` the RPC server will always be present now, but will be removed trivially if it's not used at O1 and higher.	2024-12-02 14:31:51 -06:00
Michael Halkenhäuser	d36f66b42d	[NFC][offload][OMPT] Cleanup of OMPT internals (#109005 ) Removed `OmptCallbacks.cpp` since relevant contents were duplicated. Because of the static linking there should be no change in functionality.	2024-09-23 11:58:40 +02:00
Joseph Huber	c3ac3fe825	[OpenMP] Fix redefining `stdint.h` types (#108607 ) Summary: We can include `stdint.h` just fine as long as we don't allow it to find system headers, passing `-nostdlibinc` and `-nogpuinc` suppresses these extra paths so we will just use the clang resource headers for `stdint.h` and `stddef.h`.	2024-09-13 13:22:44 -05:00
Johannes Doerfert	08533a3ee8	[Offload][NFC] Reorganize `utils::` and make Device/Host/Shared clearer (#100280 ) We had three `utils::` namespaces, all with different "meaning" (host, device, hsa_utils). We should, when we can, keep "include/Shared" accessible from host and device, thus RefCountTy has been moved to a separate header. `hsa_utils` was introduced to make `utils::` less overloaded. And common functionality was de-duplicated, e.g., `utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type punning now checks for the size of the result to make sure it matches the source type. No functional change was intended.	2024-09-05 13:36:26 -07:00
Johannes Doerfert	ff12c0061b	[Offload] Ensure to load images when the device is used (#103002 ) When we use the device, e.g., with an API that interacts with it, we need to ensure the image is loaded and the constructors are executed. Two tests are included to verify we 1) load images and run constructors when needed, and 2) we do so lazily only if the device is actually used. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-08-13 14:41:26 -07:00
Johannes Doerfert	80525dfcde	[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549 ) Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm` functions to orchestrate the kernel launch rather than `cuda` functions. These `llvm` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void DevicePtr, int DeviceNum); } __global__ void square(int A) { A = 42; } int main(int argc, char argv) { int DevNo = 0; int Ptr = reinterpret_cast<int >(llvm_omp_target_alloc_shared(4, DevNo)); Ptr = 7; printf("Ptr %p, Ptr %i\n", Ptr, Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, Ptr %i\n", Ptr, Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, Ptr 7 Ptr 0x155448ac8000, Ptr 42 ```	2024-08-12 17:44:58 -07:00
Johannes Doerfert	9a1013220b	[Offload] Allow to record kernel launch stack traces (#100472 ) Similar to (de)allocation traces, we can record kernel launch stack traces and display them in case of an error. However, the AMD GPU plugin signal handler, which is invoked on memroy faults, cannot pinpoint the offending kernel. Insteade print `<NUM>`, set via `OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>`, many traces. The recoding/record uses a ring buffer of fixed size (for now 8). For `trap` errors, we print the actual kernel name, and trace if recorded.	2024-07-31 11:49:50 -07:00
Johannes Doerfert	7102592af7	[Offload] Repair and rename `llvm-omp-device-info` (to `-offload-`) (#100309 ) The `llvm-omp-device-info` tool is very handy, but broke due to the lazy evaluation of devices. This repairs the functionality and adds a test. The tool is also renamed into `llvm-offload-device-info` as `-omp-` is going away.	2024-07-24 09:35:09 -07:00
Jan Patrick Lehr	caaf8099ef	[Offload][OMPT] Add callbacks for (dis)associate_ptr (#99046 ) This adds the OMPT callbacks for the API functions disassociate_ptr and associate_ptr.	2024-07-17 10:15:19 +02:00
Johannes Doerfert	54b5c76d3b	[Offload] Use flat array for cuLaunchKernel (#95116 ) We already used a flat array of kernel launch parameters for the AMD GPU launch but now we also use this scheme for the NVIDIA GPU launch. The only remaining/required use of the indirection is the host plugin (due ot ffi). This allows to us simplify the use for non-OpenMP kernel launch.	2024-06-13 09:43:47 +03:00
Johannes Doerfert	2eb60e2de8	[Offload][NFCI] Initialize the KernelArgsTy to default values (#95117 ) Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-06-11 17:05:04 +03:00
Joseph Huber	435aa7663d	[Libomptarget] Rework device initialization and image registration (#93844 ) Summary: Currently, we register images into a linear table according to the logical OpenMP device identifier. We then initialize all of these images as one block. This logic requires that images are compatible with all devices instead of just the one that it can run on. This prevents us from running on systems with heterogeneous devices (i.e. image 1 runs on device 0 image 0 runs on device 1). This patch reworks the logic by instead making the compatibility check a per-device query. We then scan every device to see if it's compatible and do it as they come.	2024-06-06 08:10:56 -05:00
Joseph Huber	033fa81480	[Offload][NFC] Remove unused files following static plugins Summary: Forgot to remove these when I landed the initial patch, they are no longer used.	2024-05-16 11:48:34 -05:00
Joseph Huber	fa9e90f5d2	[Reland][Libomptarget] Statically link all plugin runtimes (#87009 ) This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868	2024-05-09 09:38:22 -05:00
Joseph Huber	e5e66073c3	Revert "[Libomptarget] Statically link all plugin runtimes (#87009 )" Caused failures on build-bots, reverting to investigate. This reverts commit 80f9e814ec896fdc57ee84afad8ac4cb1f8e4627.	2024-05-09 07:05:23 -05:00
Joseph Huber	80f9e814ec	[Libomptarget] Statically link all plugin runtimes (#87009 ) This patch overhauls the `libomptarget` and plugin interface. Currently, we define a C API and compile each plugin as a separate shared library. Then, `libomptarget` loads these API functions and forwards its internal calls to them. This was originally designed to allow multiple implementations of a library to be live. However, since then no one has used this functionality and it prevents us from using much nicer interfaces. If the old behavior is desired it should instead be implemented as a separate plugin. This patch replaces the `PluginAdaptorTy` interface with the `GenericPluginTy` that is used by the plugins. Each plugin exports a `createPlugin_<name>` function that is used to get the specific implementation. This code is now shared with `libomptarget`. There are some notable improvements to this. 1. Massively improved lifetimes of life runtime objects 2. The plugins can use a C++ interface 3. Global state does not need to be duplicated for each plugin + libomptarget 4. Easier to use and add features and improve error handling 5. Less function call overhead / Improved LTO performance. Additional changes in this plugin are related to contending with the fact that state is now shared. Initialization and deinitialization is now handled correctly and in phase with the underlying runtime, allowing us to actually know when something is getting deallocated. Depends on https://github.com/llvm/llvm-project/pull/86971 https://github.com/llvm/llvm-project/pull/86875 https://github.com/llvm/llvm-project/pull/86868	2024-05-09 06:35:54 -05:00
Joseph Huber	b07177fb68	[Libomptarget] Rework interface for enabling plugins (#86875 ) Summary: Previously we would build all of the plugins by default and then only load some using the `LIBOMPTARGET_PLUGINS_TO_LOAD` variable. This patch renamed this to `LIBOMPTARGET_PLUGINS_TO_BUILD` and changes whether or not it will include the plugin in CMake. Additionally this patch creates a new `Targets.def` file that allows us to enumerate all of the enabled plugins. This is somewhat different from the old method, and it's done this way for future use that will need to be shared. This follows the same method that LLVM uses for its targets, however it does require adding an extra include path. Depends on https://github.com/llvm/llvm-project/pull/86868	2024-04-29 11:18:37 -05:00
Johannes Doerfert	330d8983d2	[Offload] Move `/openmp/libomptarget` to `/offload` (#75125 ) In a nutshell, this moves our libomptarget code to populate the offload subproject. With this commit, users need to enable the new LLVM/Offload subproject as a runtime in their cmake configuration. No further changes are expected for downstream code. Tests and other components still depend on OpenMP and have also not been renamed. The results below are for a build in which OpenMP and Offload are enabled runtimes. In addition to the pure `git mv`, we needed to adjust some CMake files. Nothing is intended to change semantics. ``` ninja check-offload ``` Works with the X86 and AMDGPU offload tests ``` ninja check-openmp ``` Still works but doesn't build offload tests anymore. ``` ls install/lib ``` Shows all expected libraries, incl. - `libomptarget.devicertl.a` - `libomptarget-nvptx-sm_90.bc` - `libomptarget.rtl.amdgpu.so` -> `libomptarget.rtl.amdgpu.so.18git` - `libomptarget.so` -> `libomptarget.so.18git` Fixes: https://github.com/llvm/llvm-project/issues/75124 --------- Co-authored-by: Saiyedul Islam <Saiyedul.Islam@amd.com>	2024-04-22 09:51:33 -07:00

25 Commits