llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-16 12:36:37 +00:00

Author	SHA1	Message	Date
Callum Fare	992b00020f	Reland #2 - [Offload] Introduce offload-tblgen and initial new API implementation (#108413 . #117704 ) (#117894 ) Relands #117704, which relanded changes from #108413 - this was reverted due to build issues. The new offload library did not build with `LIBOMPTARGET_OMPT_SUPPORT` enabled, which was not picked up by pre-merge testing. The last commit contains the fix; everything else is otherwise identical to the approved PR. ___ ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.	2024-11-28 10:19:37 +00:00
Joseph Huber	a24aa7dfa5	[Offload] Use libc 'hand-in-hand' module to find RPC header (#117928 ) Summary: We should now use the official™ way to include the files from `libc/shared`. This required some code to make sure that it's not included twice if multiple people use it as well as a sanity check on the directory.	2024-11-27 20:14:13 -06:00
Joseph Huber	1d810ece2b	[libc] Move libc server handlers to a shared header (#117908 ) Summary: We can simply include this header from the shared directory now and do not need to have this level of indirection. Simply stash it with the other libc opcode handlers. If we were able to move the printf handlers to the shared directory then this could just be a header as well, which would HEAVILY simplify the mess associated with building the RPC server first in the projects build, then copying it to the runtimes build.	2024-11-27 14:57:52 -06:00
Joseph Huber	89d8e70031	[libc] Export a pointer to the RPC client directly (#117913 ) Summary: We currently have an unnecessary level of indirection when initializing the RPC client. This is a holdover from when the RPC client was not trivially copyable and simply makes it more complicated. Here we use the `asm` syntax to give the C++ variable a valid name so that we can just copy to it directly. Another advantage to this, is that if users want to piggy-back on the same RPC interface they need only declare theirs as extern with the same symbol name, or make it weak to optionally use it if LIBC isn't avaialb.e	2024-11-27 14:57:38 -06:00
Fraser Cormack	0cb5846a68	Revert "Reland - [Offload] Introduce offload-tblgen and initial new API implementation (#108413 ) (#117704 )" This reverts commit c979ec05642f292737d250c6682d85ed49bc7b6e. This showed failures in the post-merge CI.	2024-11-27 10:49:01 +00:00
Callum Fare	c979ec0564	Reland - [Offload] Introduce offload-tblgen and initial new API implementation (#108413 ) (#117704 ) Relands changes from #108413 - this was reverted due to build issues. The problem was just that the `offload-tblgen` tool was behind recent changes to tablegen that ensure `const` records. This has been fixed and the PR is otherwise identical. ___ ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.	2024-11-27 10:39:07 +00:00
Joseph Huber	d047bee496	Revert "[Offload] Introduce offload-tblgen and initial new API implementation (#108413 )" This reverts commit 8a2311c4bf9993230e37dc20b57973dc917f2338.	2024-11-25 12:16:46 -06:00
Joseph Huber	d7c20a6f0c	[libc][NFC] Move RPC opcodes to the 'shared/' directory as well	2024-11-25 12:04:10 -06:00
Callum Fare	8a2311c4bf	[Offload] Introduce offload-tblgen and initial new API implementation (#108413 ) Introduce `offload-tblgen` and an initial implementation of a subset of the new API. The tablegen files are intended to be the single source of truth for the new API, with the header files, documentation, and others bits of source all automatically generated. TODO (based on review feedback so far): - [x] Check in the generated headers - [x] Add an `offload-generate` target to trigger the generation rather than building them every time - [x] Decide how error handling should work - [x] Finish up new error handling implementation - [x] Decide naming convention - [x] Add testing for the new API - [x] Add tablegen specific testing - [x] clang-tidy and use llvm:: types when possible - [x] Add optional code location arguments - [x] Avoid multiple returns from one function ### offload-tblgen See the included [README](`d80db06491/offload/new-api/API/README.md`) for more information on how the API definition and generation works. I'm happy to answer any questions about it and plan to walk through it in a future LLVM Offload call. It should be noted that struct definitions have not been fully implemented/tested as they aren't used by the initial API definitions, but finishing that off in the future shouldn't be too much work. The tablegen tooling has been designed to be easily extended with new backends, using the classes in `RecordTypes.hpp` to abstract over the tablegen records. ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh $ git clone -b offload_adapter https://github.com/callumfare/unified-runtime.git $ cd unified-runtime $ mkdir build $ cd build $ cmake .. -GNinja -DUR_BUILD_ADAPTER_OFFLOAD=ON \ -DUR_OFFLOAD_INSTALL_DIR=<offload build dir containing liboffload_new.so> \ -DUR_OFFLOAD_INCLUDE_DIR=<offload build dir containing 'offload' headers directory> $ ninja urinfo export LD_LIBRARY_PATH=<offload build dir containing offload plugin libraries> $ UR_ADAPTERS_FORCE_LOAD=$PWD/lib/libur_adapter_offload.so ./bin/urinfo [cuda:gpu][cuda:0] CUDA, NVIDIA GeForce GT 1030 [12030] # Demo with tracing $ OFFLOAD_TRACE=1 UR_ADAPTERS_FORCE_LOAD=$PWD/lib/libur_adapter_offload.so ./bin/urinfo ---> offloadPlatformGet(.NumEntries = 0, .phPlatforms = {}, .pNumPlatforms = 0x7ffd05e4d6e0 (2))-> OFFLOAD_RESULT_SUCCESS ---> offloadPlatformGet(.NumEntries = 2, .phPlatforms = {0x564bf4040220, 0x564bf4040240}, .pNumPlatforms = nullptr)-> OFFLOAD_RESULT_SUCCESS ... ``` ### Open questions and future work * The new API is implemented in a separate library (`liboffload_new.so`). It could just as easily be part of the existing `libomptarget` library - I have no strong feelings on which is better. * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.	2024-11-25 11:34:14 -06:00
Joseph Huber	506ca19dc9	[OpenMP] Remove use of '__AMDGCN_WAVEFRONT_SIZE' (#113156 ) Summary: This is going to be deprecated in https://github.com/llvm/llvm-project/pull/112849. This patch ports it to use the builtin instead. This isn't a compile constant, so it could slightly negatively affect codegen. There really should be an IR pass to turn it into a constant if the function has known attributes. Using the builtin is correct when we just do it for knowing the size like we do here. Obviously guarding w32/w64 code with this check would be broken.	2024-11-25 07:38:28 -06:00
Joseph Huber	b4d49fb52e	[libc] Remove RPC server API and use the header directly (#117075 ) Summary: This patch removes much of the `llvmlibc_rpc_server` interface. This pretty much deletes all of this code and just replaces it with including `rpc.h` directly. We still maintain the file to let `libc` handle the opcodes, since those depend on the `printf` impelmentation. This will need to be cleaned up more, but I don't want to put too much into a single patch.	2024-11-25 07:13:28 -06:00
Joseph Huber	676a1e6643	[AMDGPU] Remove uses of deprecreated HSA executable functions (#117241 ) Summary: These functions were deprecated in ROCR 1.3 which was released quite some time ago. The main functionality that was lost was modifying and inspecting the code object indepedently of the executable, however we do all of that custom through our ELF API. This should be within the versions of other functions we use.	2024-11-22 07:16:40 -06:00
Ivan Radanov Ivanov	0a27e4eed4	[offload] Fix copy-paste defect in error message	2024-11-19 17:12:51 +09:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Johannes Doerfert	2edfa50e7f	[Offload][NFC] Update README.md	2024-11-17 07:32:29 -08:00
agozillon	3723449955	[OpenMP] Allocatable explicit member mapping fortran offloading tests (#113555 ) This PR is one in a series of 3 that aim to add support for explicit member mapping of allocatable components in derived types within OpenMP+Fortran for Flang. This PR provides all of the runtime tests that are currently upstreamable, unfortunately some of the other tests would require linking of the fortran runtime for offload which we currently do not do. But regardless, this is plenty to ensure that the mapping is working in most cases.	2024-11-16 12:22:33 +01:00
Matin Raayai	bb3f5e1fed	Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234 ) Following discussions in #110443, and the following earlier discussions in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html, https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine` interface classes. More specifically: 1. Makes `TargetMachine` the only class implemented under `TargetMachine.h` in the `Target` library. 2. `TargetMachine` contains target-specific interface functions that relate to IR/CodeGen/MC constructs, whereas before (at least on paper) it was supposed to have only IR/MC constructs. Any Target that doesn't want to use the independent code generator simply does not implement them, and returns either `false` or `nullptr`. 3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming aims to make the purpose of `LLVMTargetMachine` clearer. Its interface was moved under the CodeGen library, to further emphasis its usage in Targets that use CodeGen directly. 4. Makes `TargetMachine` the only interface used across LLVM and its projects. With these changes, `CodeGenCommonTMImpl` is simply a set of shared function implementations of `TargetMachine`, and CodeGen users don't need to static cast to `LLVMTargetMachine` every time they need a CodeGen-specific feature of the `TargetMachine`. 5. More importantly, does not change any requirements regarding library linking. cc @arsenm @aeubanks	2024-11-14 13:30:05 -08:00
Joseph Huber	3a20a5f510	[Offload] Move compiler-rt to runtimes in cache	2024-11-14 10:44:29 -06:00
agozillon	d84d0caf28	[Flang][OpenMP] Update MapInfoFinalization to use BlockArgs Interface and modify use_device_ptr/addr to be order independent (#113919 ) This patch primarily updates the MapInfoFinalization pass to utilise the BlockArgument interface. It also shuffles newly added arguments the MapInfoFinalization passes to the end of the BlockArg/Relevant MapInfo lists, instead of one prior to the owning descriptor type. During this it was noted that the use_device_ptr/addr handling of target data was a little bit too order dependent so I've attempted to make it less so, as we cannot depend on argument ordering to be the same as Fortran for any future frontends.	2024-11-14 15:47:37 +01:00
aurel32	b6bd7477a9	[Offload] Add support for riscv64 to host plugin (#115773 ) This adds support for the riscv64 architecture to the offload host plugin. The check to define FFI_DEFAULT_ABI is intentionally not guarded by __riscv_xlen as the value is the same for riscv32 and riscv64 (support for OpenMP on riscv32 is still under review).	2024-11-13 08:15:49 -06:00
Joseph Huber	de41b137dd	[Offload] Provide a CMake cache file to easily build offloading (#115074 ) Summary: This patch adds a cache file that will automatically enable openpm, offload, and all the fancy GPU libraries.	2024-11-07 15:35:29 -06:00
Joseph Huber	d661aea4c5	[OpenMP] Add support for custom callback in AMDGPUStream (#112785 ) Summary: We have the ability to schedule callbacks after certain events complete. Currently we can register an arbitrary callback in CUDA, but can't in AMDGPU. I am planning on using this support to move the RPC handling to a separate thread, then using these callbacks to suspend / resume it when no kernels are running. This is a preliminary patch to keep this noise out of that one.	2024-10-29 10:18:32 -07:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Brad Richardson	06eb10dadf	[flang][driver] rename flang-new to flang (#110023 ) This does a global rename from `flang-new` to `flang`. I also removed/changed any TODOs that I found related to making this change. --------- Co-authored-by: H. Vetinari <h.vetinari@gmx.com> Co-authored-by: Andrzej Warzynski <andrzej.warzynski@arm.com>	2024-10-10 09:26:04 +01:00
Ivan Butygin	26ca8ef836	[libc] GPU RPC interface: add return value to `rpc_host_call` (#111288 )	2024-10-06 20:22:07 +03:00
Joseph Huber	e8d2057ca4	[OpenMP] Add critical region lock for NVPTX targets (#110148 ) Summary: We define this on AMDGCN but not NVPTX, which leads to some failures dependong on the target.	2024-09-26 11:33:52 -07:00
agozillon	0215579dab	[Flang][Offload][Tests] Set default OpenMP version to 5.2 (52) (#110138 ) We recently added versioning support to Flang's OpenMP, which restricts and enables certain things based on the OpenMP specification version. Currently one of the check-offload tests makes use of a feature that's at a slightly higher version than the current default causing it to fail. This PR basically applies the highest current OpenMP version number as a default argument for the lit.cfg, if we need more fine grained control in the future we can expand it to different lit commands for each relevant version than can then be added in each test. But for now, to keep it simple, just set the max level version.	2024-09-26 18:56:26 +02:00
Michael Halkenhäuser	d36f66b42d	[NFC][offload][OMPT] Cleanup of OMPT internals (#109005 ) Removed `OmptCallbacks.cpp` since relevant contents were duplicated. Because of the static linking there should be no change in functionality.	2024-09-23 11:58:40 +02:00
Joseph Huber	5f02558d82	[OpenMP] Fix not linking C libraries when enabled (#109168 ) Summary: We used to do this automatically, add it back in to do it manually.	2024-09-18 10:02:16 -07:00
Joseph Huber	c3ac3fe825	[OpenMP] Fix redefining `stdint.h` types (#108607 ) Summary: We can include `stdint.h` just fine as long as we don't allow it to find system headers, passing `-nostdlibinc` and `-nogpuinc` suppresses these extra paths so we will just use the clang resource headers for `stdint.h` and `stddef.h`.	2024-09-13 13:22:44 -05:00
Akash Banerjee	a67b6e1635	Fix typo in test.	2024-09-06 14:23:06 +01:00
Johannes Doerfert	08533a3ee8	[Offload][NFC] Reorganize `utils::` and make Device/Host/Shared clearer (#100280 ) We had three `utils::` namespaces, all with different "meaning" (host, device, hsa_utils). We should, when we can, keep "include/Shared" accessible from host and device, thus RefCountTy has been moved to a separate header. `hsa_utils` was introduced to make `utils::` less overloaded. And common functionality was de-duplicated, e.g., `utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type punning now checks for the size of the result to make sure it matches the source type. No functional change was intended.	2024-09-05 13:36:26 -07:00
Akash Banerjee	142433684a	[OpenMP][Flang] Fix dynamic-extent array mapping (#107247 ) This patch fixes the mapping and lowering of arrays with dynamic extents and adds a new test for the same. The fix discards the incomplete the dynamic extent information and replacing it with just the base type. When lowering to llvm later, the bounds information is used instead.	2024-09-05 12:44:10 +01:00
Jan Patrick Lehr	1a0cf245ac	[Offload] Change x86_64-pc-linux to x86_64-unknown-linux (#107023 ) It appears that the RUNTIMES build prefers the x86-64-unknown-linux-gnu triple notation for the host. This fixes runtime / test breakages when compiler-rt is used as the CLANG_DEFAULT_RTLIB.	2024-09-03 14:25:33 +02:00
WÁNG Xuěruì	9adf81182e	[Offload] Fix stray libomptarget message helper calls (#106837 ) In #92581 the `LibomptargetUitls.cmake` helpers have been removed, but only uses of `libomptarget_say` were migrated. Migrate the remaining few warning and error messages so the `check-offload` target would not fail due to missing `libomptarget_warning_say`. While at it, update the `check-offload` unavailability message to say `check-offload` instead of `check-libomptarget`. Fixes #92581	2024-08-31 07:06:41 -05:00
WÁNG Xuěruì	75545b3449	[Offload] Fix disabling of cuda target on unsupported platforms (#106835 ) The target name and the message are wrong -- both should say "cuda" for the filtering to work. Fixes commit 300e5b911442 (#93186).	2024-08-31 07:06:17 -05:00
agozillon	f4cf93fb50	[Flang][OpenMP] Align map clause generation and fix issue with non-shared allocations for assumed shape/size descriptor types (#97855 ) This PR aims to unify the map argument generation behavior across both the implicit capture (captured in a target region) and the explicit capture (process map), currently the varPtr field of the MapInfo for the same variable will be different depending on how it's captured. This PR tries to align that across the generations of MapInfoOp in the OpenMP lowering. Currently, I have opted to utilise the rawInput (input memref to a HLFIR DeclareInfoOp) as opposed to the addr field which includes more information. The side affect of this is that we have to deal with BoxTypes less often, which will result in simpler maps in these cases. The negative side affect of this is that we don't have access to the bounds information through the resulting value, however, I believe the bounds information we require in our case is still appropriately stored in the map bounds, and this seems to be the case from testing so far. The other fix is for cases where we end up with a BoxType argument into a function (certain assumed shape and sizes cases do this) that has no fir.ref wrapping it. As we need the Box to be a reference type to actually utilise the operation to access the base address stored inside and create the correct mappings we currently generate an intermediate allocation in these cases, and then store into it, and utilise this as the map argument, as opposed to the original. However, as we were not sharing the same intermediate allocation across all of the maps for a variable, this resulted in errors in certain cases when detatching/attatching the data e.g. via enter and exit. This PR adjusts this for cases Currently we only maintain tracking of all intermediate allocations for the current function scope, as opposed to module. Primarily as the only case I am aware of that this is required is in cases where we pass certain types of arguments to functions (so I opted to minimize the overhead of the pass for now). It could likely be extended to module scope if required if we find other cases where it's applicable and causing issues.	2024-08-23 19:48:43 +02:00
Ethan Luis McDonough	fde2d23ee2	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587 ) (#102691 ) This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-08-22 01:10:54 -05:00
Johannes Doerfert	3b7611594f	[Offload] Improve error reporting on memory faults (#104254 ) Since we can already track allocations, we can diagnose memory faults to some degree. If the fault happens in a prior allocation (use after free) or "close but outside" one, we can provide that information to the user. Note that the fault address might be page aligned, and not all accesses trigger a fault, especially for allocations that are backed by a MemoryManager. Still, if people disable the MemoryManager or the allocation is big enough, we can sometimes provide valueable feedback.	2024-08-21 10:01:35 -07:00
Joseph Huber	e96146cd46	[OpenMP] Temporarily disable test to keep bots green Summary: This test mysteriously fails on the bots but not locally, disable until I can figure out why.	2024-08-20 15:16:05 -05:00
Joseph Huber	e0326b668e	[OpenMP] Map `omp_default_mem_alloc` to global memory (#104790 ) Summary: Currently, we assign this to private memory. This causes failures on some SOLLVE tests. The standard isn't clear on the semantics of this allocation type, but there seems to be a consensus that it's supposed to be shared memory.	2024-08-20 12:00:41 -05:00
Fabian Mora	cfc76b6498	[llvm][offload] Move AMDGPU offload utilities to LLVM (#102487 ) This patch moves utilities from `offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h` to `llvm/Frontend/Offloading/Utility.h` to be reused by other projects. Concretely the following changes were made: - Rename `KernelMetaDataTy` to `AMDGPUKernelMetaData`. - Remove unused fields `KernelObject`, `KernelSegmentSize`, `ExplicitArgumentCount` and `ImplicitArgumentCount` from `AMDGPUKernelMetaData`. - Return the produced error if `ELFObj.sections()` failed instead of using `cantFail`. - Added `AGPRCount` field to `AMDGPUKernelMetaData`. - Added a default invalid value to all the fields in `AMDGPUKernelMetaData`.	2024-08-20 09:03:06 -04:00
estewart08	ea8bb4d633	[offload] - Fix issue with standalone debug offload build (#104647 ) Error: CommandLine Error: Option 'attributor-manifest-internal' registered more than once During the standalone debug build of offload the above error is seen at app runtime when using a prebuilt llvm with LLVM_LINK_LLVM_DYLIB=ON. This is caused by linking both libLLVM.so and various archives that are found via llvm_map_components_to_libnames for jit support.	2024-08-19 17:59:21 -05:00
Joseph Huber	161e250add	[OpenMP] Fix buildbot failing on allocator test	2024-08-14 13:56:12 -05:00
Joseph Huber	74d23f15b6	[OpenMP] Implement 'omp_alloc' on the device (#102526 ) Summary: The 'omp_alloc' function should be callable from a target region. This patch implemets it by simply calling `malloc` for every non-default trait value allocator. All the special access modifiers are unimplemented and return null. The null allocator returns null as the spec states it should not be usable from the target.	2024-08-14 13:38:55 -05:00
Johannes Doerfert	ff12c0061b	[Offload] Ensure to load images when the device is used (#103002 ) When we use the device, e.g., with an API that interacts with it, we need to ensure the image is loaded and the constructors are executed. Two tests are included to verify we 1) load images and run constructors when needed, and 2) we do so lazily only if the device is actually used. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-08-13 14:41:26 -07:00
Joseph Huber	dcc27ea41e	[LinkerWrapper] Always pass `-flto` if the linker supports it (#102972 ) Summary; Now that we use the linker to do LTO / device linking, we need to inform the `clang` invocation to use `-flto` so it forwards arguments like `-On` correctly.	2024-08-13 11:23:55 -05:00
Johannes Doerfert	3f9c9acedd	[Offload] Add the right paths to the CUDA lit tests (#102997 )	2024-08-12 20:34:33 -07:00
Johannes Doerfert	80525dfcde	[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549 ) Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm` functions to orchestrate the kernel launch rather than `cuda` functions. These `llvm` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void DevicePtr, int DeviceNum); } __global__ void square(int A) { A = 42; } int main(int argc, char argv) { int DevNo = 0; int Ptr = reinterpret_cast<int >(llvm_omp_target_alloc_shared(4, DevNo)); Ptr = 7; printf("Ptr %p, Ptr %i\n", Ptr, Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, Ptr %i\n", Ptr, Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, Ptr 7 Ptr 0x155448ac8000, Ptr 42 ```	2024-08-12 17:44:58 -07:00
jyu2-git	a43677c172	Test faild with amd. (#101781 ) Add unspport. This is relate #101101	2024-08-02 17:53:23 -07:00

1 2 3 4 5

242 Commits