llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-18 21:26:49 +00:00

Author	SHA1	Message	Date
Joseph Huber	2f41fa387d	[AMDGPU] Fix code object version not being set to 'none' (#135036 ) Summary: Previously, we removed the special handling for the code object version global. I erroneously thought that this meant we cold get rid of this weird `-Xclang` option. However, this also emits an LLVM IR module flag, which will then cause linking issues.	2025-04-10 11:31:21 -05:00
Zequan Wu	78b21ddba7	Revert "Reland "Symbolize line zero as if no source info is available (#124846 )" (#133798 )" This reverts commit 348374028970c956f2e49ab7553b495d7408ccd9 because #128619 doesn't handle the case when we have an empty frame from `getInliningInfoForAddress` because line num is 0 which makes it non-differentiable from missing debug info. So, we end up using the base filename from symtab again. Reverting for now until that issus is solved.	2025-04-09 18:09:31 -07:00
Joel E. Denny	ad9f6d3cee	[PGO][Offload] Use %profdata in PGO tests (#135015 ) So that the wrong llvm-profdata is not picked up from PATH.	2025-04-09 10:40:46 -04:00
Jan Leyonberg	fbc8335311	[MLIR][OpenMP] Add codegen for teams reductions (#133310 ) This patch adds the lowering of teams reductions from the omp dialect to LLVM-IR. Some minor cleanup was done in clang to remove an unused parameter.	2025-04-07 12:47:16 -04:00
Zequan Wu	3483740289	Reland "Symbolize line zero as if no source info is available (#124846 )" (#133798 ) This land commits 23aca2f88dd5d2447e69496c89c3ed42a56f9c31 and 1b15a89a23c631a8e2d096dad4afe456970572c0. https://github.com/llvm/llvm-project/pull/128619 makes symbolizer to always use debug info when available so we can reland this chagnge.	2025-03-31 19:13:46 -04:00
Ethan Luis McDonough	0c81105373	[PGO][Offload] Disable PGO on NVPTX (#133522 )	2025-03-28 16:32:32 -05:00
macurtis-amd	21a8c63cdc	[offload] Remove bad assert in StaticLoopChunker::Distribute (#132705 ) When building with asserts enabled, this can actually cause strange miscompilations because an incorrect llvm.assume is generated at the point of the assertion.	2025-03-28 04:53:00 -05:00
Ethan Luis McDonough	c50d39f073	[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268 ) This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on https://github.com/llvm/llvm-project/pull/93365. This PR makes the following changes: - Allows PGO flags to be supplied to GPU targets - Pulls version global from device - Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to allow the PGO version to be overridden	2025-03-19 19:01:38 -05:00
Krzysztof Parzyszek	f4fc2d731c	[flang][OpenMP] Map ByRef if size/alignment exceed that of a pointer (#130832 ) Improve the check for whether a type can be passed by copy. Currently, passing by copy is done via the OMP_MAP_LITERAL mapping, which can only transfer as much data as can be contained in a pointer representation.	2025-03-12 19:41:11 -05:00
Krzysztof Parzyszek	d67947162f	[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568 ) The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, `has_device_addr(x)` means that (in C/C++ terms) `&x` is a device address. When entering a target region, `x` does not need to be allocated on the device, or have its contents copied over (in the absence of additional mapping clauses). Passing its address verbatim to the region for use is sufficient, and is the intended goal of the clause. Some Fortran objects use descriptors in their in-memory representation. If `x` had a descriptor, both the descriptor and the contents of `x` would be located in the device memory. However, the descriptors are managed by the compiler, and can be regenerated at various points as needed. The address of the effective descriptor may change, hence it's not safe to pass the address of the descriptor to the target region. Instead, the descriptor itself is always copied, but for objects like `x`, no further mapping takes place (as this keeps the storage pointer in the descriptor unchanged). --------- Co-authored-by: Sergio Afonso <safonsof@amd.com>	2025-03-10 08:11:01 -05:00
agozillon	f1178815d2	[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for Flang maps (#129586 ) This PR adds an initial implementation for the map modifiers close, present and ompx_hold, primarily just required adding the appropriate map type flags to the map type bits. In the case of ompx_hold it required adding the map type to the OpenMP dialect. Close has a bit of a problem when utilised with the ALWAYS map type on descriptors, so it is likely we'll have to make sure close and always are not applied to the descriptor simultaneously in the future when we apply always to the descriptors to facilitate movement of descriptor information to device for consistency, however, we may find an alternative to this with further investigation. For the moment, it is a TODO/Note to keep track of it.	2025-03-07 22:22:30 +01:00
Jan Patrick Lehr	1824bb47c2	[Offload][OpenMP] Fix check-prefix (#128599 )	2025-02-25 00:32:27 +01:00
Zequan Wu	1b15a89a23	Revert "[Offload] Fix assumptions on symbols after #124846 (#126238 )" The dependency commit was reverted at `23aca2f88d`. Reverting this as well.	2025-02-24 13:30:54 -08:00
Fabian Ritter	a2f9ae1421	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (#125826 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633	2025-02-19 09:56:04 +01:00
Akash Banerjee	785a5b4676	[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746 ) This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare Mapper directive. Since both MLIR and Clang now support custom mappers, I've changed the respective function params to no longer be optional as well. Depends on #121005	2025-02-18 17:55:48 +00:00
Joseph Huber	1435c8ed95	Reapply "[LinkerWrapper] Clean up options after proper forwarding" (#126495 ) Summary: The test failed because it no longer passed Rpass by default without LTO. I think that's desirable as it matches the standard behavior. This reverts commit 6fd99de31864a5ef84ae8613b3a9034e05293461.	2025-02-14 09:56:46 -06:00
Ethan Luis McDonough	52ee06d273	[PGO][Offload] Fix pgo1.c (#126864 ) pgo1.c had outdated test checks	2025-02-12 00:54:31 -06:00
Ethan Luis McDonough	9e5c136d5a	[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365 ) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set	2025-02-11 23:30:54 -06:00
Jan Patrick Lehr	191d7d64e5	[Offload] Fix assumptions on symbols after #124846 (#126238 ) In #124846 the symbolizer was changed to ignore 0-column entries, which lead to a slightly different representation in the stack traces. This patch addresses these differences. Not sure if the difference in kernel_trap.c is also a result of this change or not. Can be tracked separate from this, after the bots are back to green.	2025-02-07 13:25:11 +01:00
David Blaikie	14d6e1ebf5	Update test for symbolizer fix	2025-02-06 19:18:20 +00:00
Michał Górny	689ef5fda0	[offload] [test] Use test compiler ID rather than host (#124408 ) Use the test compiler ID to verify whether tests can be run rather than the host compiler. This makes it possible to run tests (with Clang) while the library itself was built with GCC.	2025-02-02 15:55:39 +00:00
Christian Clauss	1f56bb3137	[Offload][NFC] Fix typos discovered by codespell (#125119 ) https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`	2025-01-31 09:35:29 -06:00
agozillon	2428b6ec40	[Flang][MLIR][OpenMP] Fix Target Data if (present(...)) causing LLVM-IR branching error (#123771 ) Currently if we generate code for the below target data map that uses an optional mapping: !$omp target data if(present(a)) map(alloc:a) do i = 1, 10 a(i) = i end do !$omp end target data We yield an LLVM-IR error as the branch for the else path is not generated. This occurs because we enter the NoDupPriv path of the call back function when generating the else branch, however, the emitBranch function needs to be set to a block for it to functionally generate and link in a follow up branch. The NoDupPriv path currently doesn't do this, while it's not supposed to generate anything (as far as I am aware) we still need to at least set the builders placement back so that it emits the appropriate follow up branch. This avoids the missing terminator LLVM-IR verification error by correctly generating the follow up branch.	2025-01-30 17:33:36 +01:00
agozillon	e0054e984c	[MLIR][OpenMP] Emit nullary check for mapped pointer members and appropriate size select based on results (#124604 ) This PR aims to fix a mapping error when trying to map nullary elements of a record type (primary example is allocatables/pointer types in Fortran at the moment). This should be legal to map, just not write to without pointing to anything within the target region. A common Fortran OpenMP idiom/example where this is useful can be found in the added Fortran offload example. The runtime error arises when we try to map the pointer member utilising a prescribed constant size that we receive from the lowered type, resulting in mapping of data that will be non-existent when there is no allocated data. The fix in this case is to emit a runtime check to see if the data has been allocated, if it hasn't been we select a size of 0, if it has we emit the usual type size.	2025-01-29 17:51:33 +01:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Joseph Huber	134401deea	[Offload] Move RPC server handling to a dedicated thread (#112988 ) Summary: Handling the RPC server requires running through list of jobs that the device has requested to be done. Currently this is handled by the thread that does the waiting for the kernel to finish. However, this is not sound on NVIDIA architectures and only works for async launches in the OpenMP model that uses helper threads. However, we also don't want to have this thread doing work unnnecessarily. For this reason we track the execution of kernels and cause the thread to sleep via a condition variable (usually backed by some kind of futex or other intelligent sleeping mechanism) so that the thread will be idle while no kernels are running.	2025-01-24 11:36:45 -06:00
Joseph Huber	723a3e746a	[OpenMP] Fix mispelled attribute and warning Summary: This is spelled `ompx_aligned_barrier` when used directly, but wasn't included in the list of known assumptions. Fix that so now th test works.	2025-01-20 08:40:19 -06:00
Jinsong Ji	8d1d67ec4d	[Offload][PGO] Fix dump of array in ProfData (#122039 ) Exposed by -Warray-bounds: In file included from ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:252: ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: error: array index 4 is past the end of the array (that has type 'const std::remove_const<const uint16_t>::type[4]' (aka 'const unsigned short[4]')) [-Werror,-Warray-bounds] 109 \| INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ \| ^ ~~~~~~~~~~~ ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:250:15: note: expanded from macro 'INSTR_PROF_DATA' 250 \| outs() << ProfData.Name << " "; \ \| ^ ~~~~ ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: note: array 'NumValueSites' declared here 109 \| INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ \| ^ ../../../../../../../llvm/offload/plugins-nextgen/common/include/GlobalHandler.h:62:3: note: expanded from macro 'INSTR_PROF_DATA' 62 \| std::remove_const<Type>::type Name; Avoid accessing out-of-bound data, but skip printing array data for now. As there is no simple way to do this without hardcoding the NumValueSites field. --------- Co-authored-by: Ethan Luis McDonough <ethanluismcdonough@gmail.com>	2025-01-14 15:46:27 -05:00
Johannes Doerfert	1739ba9bb5	[OpenMP][FIX] Adjust test to be non-flaky (#122331 ) The test runs asynchronous kernels and depending on the timing the output is slightly different. We now only check for the common parts of the output.	2025-01-09 11:57:11 -08:00
agozillon	fa56e8bb64	[OpenMP][MLIR] Fix threadprivate lowering when compiling for target when target operations are in use (#119310 ) Currently the compiler will ICE in programs like the following on the device lowering pass: ``` program main implicit none type i1_t integer :: val(1000) end type i1_t integer :: i type(i1_t), pointer :: newi1 type(i1_t), pointer :: tab=>null() integer, dimension(:), pointer :: tabval !$omp THREADPRIVATE(tab) allocate(newi1) tab=>newi1 tab%val(:)=1 tabval=>tab%val !$omp target teams distribute parallel do do i = 1, 1000 tabval(i) = i end do !$omp end target teams distribute parallel do end program main ``` This is due to the fact that THREADPRIVATE returns a result operation, and this operation can actually be used by other LLVM dialect (or other dialect) operations. However, we currently skip the lowering of threadprivate, so we effectively never generate and bind an LLVM-IR result to the threadprivate operation result. So when we later go on to lower dependent LLVM dialect operations, we are missing the required LLVM-IR result, try to access and use it and then ICE. The fix in this particular PR is to allow compilation of threadprivate for device as well as host, and simply treat the device compilation as a no-op, binding the LLVM-IR result of threadprivate with no alterations and binding it, which will allow the rest of the compilation to proceed, where we'll eventually discard the host segment in any case. The other possible solution to this I can think of, is doing something similar to Flang's passes that occur prior to CodeGen to the LLVM dialect, where they erase/no-op certain unrequired operations or transform them to lower level series of operations. And we would erase/no-op threadprivate on device as we'd never have these in target regions. The main issues I can see with this are that we currently do not specialise this stage based on wether we're compiling for device or host, so it's setting a precedent and adding another point of having to understand the separation between target and host compilation. I am also not sure we'd necessarily want to enforce this at a dialect level incase someone else wishes to add a different lowering flow or translation flow. Another possible issue is that a target operation we have/utilise would depend on the result of threadprivate, meaning we'd not be allowed to entirely erase/no-op it, I am not sure of any situations where this may be an issue currently though.	2025-01-03 18:01:01 +01:00
agozillon	5137c209f0	[Flang][OpenMP] Fix allocating arrays with size intrinisic (#119226 ) Attempt to address the following example from causing an assert or ICE: ``` subroutine test(a) implicit none integer :: i real(kind=real64), dimension(:) :: a real(kind=real64), dimension(size(a, 1)) :: b !$omp target map(tofrom: b) do i = 1, 10 b(i) = i end do !$omp end target end subroutine ``` Where we utilise a Fortran intrinsic (size) to calculate the size of allocatable arrays and then map it to device.	2025-01-03 16:46:15 +01:00
Kareem Ergawy	e532241b02	Re-apply (#117867 ): [flang][OpenMP] Implicitly map allocatable record fields (#120374 ) This re-applies #117867 with a small fix that hopefully prevents build bot failures. The fix is avoiding `dyn_cast` for the result of `getOperation()`. Instead we can assign the result to `mlir::ModuleOp` directly since the type of the operation is known statically (`OpT` in `OperationPass`).	2024-12-18 09:19:45 +01:00
Kareem Ergawy	dc936f3c19	Revert "[flang][OpenMP] Implicitly map allocatable record fields (#117867 )" (#120360 )	2024-12-18 06:52:24 +01:00
Kareem Ergawy	db09014a07	[flang][OpenMP] Implicitly map allocatable record fields (#117867 ) This is a starting PR to implicitly map allocatable record fields. This PR contains the following changes: 1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that these utils work on the `mlir::Value` level rather than the `semantics::Symbol` level. This takes one step towards to enabling MLIR passes to more easily do some lowering themselves (e.g. creating `omp.map.bounds` ops for implicitely caputured data like this PR does). 2. Adds support for implicitely capturing and mapping allocatable fields in record types. There is quite some distant to still cover to have full support for this. I added a number of todos to guide further development. Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com> Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>	2024-12-18 05:37:58 +01:00
Shilei Tian	92376c3ff5	[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042 )	2024-12-06 09:07:50 -05:00
Callum Fare	fd3907ccb5	Reland #118503 : [Offload] Introduce offload-tblgen and initial new API implementation (#118614 ) Reland #118503. Added a fix for builds with `-DBUILD_SHARED_LIBS=ON` (see last commit). Otherwise the changes are identical. --- ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.	2024-12-05 09:34:04 +01:00
Michał Górny	04b26f0eb7	[offload] Standalone build fixes (#118173 ) A fair number of fixes to get standalone builds of offload working — mostly copying missing bits from openmp. It's almost ready — I still need to figure out why some of the tsts aren't linking to the right libraries.	2024-12-04 11:27:37 +00:00
Jan Patrick Lehr	51553227f0	Revert "Reland of #108413 : [Offload] Introduce offload-tblgen and initial new API implementation" (#118541 ) Reverts llvm/llvm-project#118503 Broke bot https://lab.llvm.org/staging/#/builders/131/builds/9701/steps/5/logs/stdio	2024-12-03 21:42:38 +01:00
Callum Fare	8da490320f	Reland of #108413 : [Offload] Introduce offload-tblgen and initial new API implementation (#118503 ) This is another attempt to reland the changes from #108413 The previous two attempts introduced regressions and were reverted. This PR has been more thoroughly tested with various configurations so shouldn't cause any problems this time. If anyone is aware of any likely remaining problems then please let me know. The changes are identical other than the fixes contained in the last 5 commits. --- ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.	2024-12-03 16:28:35 +00:00
Joseph Huber	91f5f974cb	[OpenMP] Unconditionally provide an RPC client interface for OpenMP (#117933 ) Summary: This patch adds an RPC interface that lives directly in the OpenMP device runtime. This allows OpenMP to implement custom opcodes. Currently this is only providing the host call interface, which is the raw version of reverse offloading. Previously this lived in `libc/` as an extension which is not the correct place. The interface here uses a weak symbol for the RPC client by the same name that the `libc` interface uses. This means that it will defer to the libc one if both are present so we don't need to set up multiple instances. The presense of this symbol is what controls whether or not we set up the RPC server. Because this is an external symbol it normally won't be optimized out, so there's a special pass in OpenMPOpt that deletes this symbol if it is unused during linking. That means at `O0` the RPC server will always be present now, but will be removed trivially if it's not used at O1 and higher.	2024-12-02 14:31:51 -06:00
Shilei Tian	8cb44859cc	FIX: Fix test failure in offload/test/mapping/power_of_two_alignment.c A C file with `template` in it. Awesome.	2024-12-01 21:40:46 -05:00
Shilei Tian	ed7a8f1556	[NFC][Offload] Correct the compile command of two C test files (#118243 ) The compile command for the two C test files are `compilexx`, which is actually for C++ compilation.	2024-12-01 21:34:26 -05:00
Jan Patrick Lehr	5208bc3694	Revert "Reland #2 - [Offload] Introduce offload-tblgen and initial new API implementation (#108413 . #117704 )" (#117995 ) Reverts llvm/llvm-project#117894 Buildbot failures in OpenMP/Offload bots. https://lab.llvm.org/buildbot/#/builders/30/builds/11193	2024-11-28 12:58:17 +01:00
Callum Fare	992b00020f	Reland #2 - [Offload] Introduce offload-tblgen and initial new API implementation (#108413 . #117704 ) (#117894 ) Relands #117704, which relanded changes from #108413 - this was reverted due to build issues. The new offload library did not build with `LIBOMPTARGET_OMPT_SUPPORT` enabled, which was not picked up by pre-merge testing. The last commit contains the fix; everything else is otherwise identical to the approved PR. ___ ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.	2024-11-28 10:19:37 +00:00
Fraser Cormack	0cb5846a68	Revert "Reland - [Offload] Introduce offload-tblgen and initial new API implementation (#108413 ) (#117704 )" This reverts commit c979ec05642f292737d250c6682d85ed49bc7b6e. This showed failures in the post-merge CI.	2024-11-27 10:49:01 +00:00
Callum Fare	c979ec0564	Reland - [Offload] Introduce offload-tblgen and initial new API implementation (#108413 ) (#117704 ) Relands changes from #108413 - this was reverted due to build issues. The problem was just that the `offload-tblgen` tool was behind recent changes to tablegen that ensure `const` records. This has been fixed and the PR is otherwise identical. ___ ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.	2024-11-27 10:39:07 +00:00
Joseph Huber	d047bee496	Revert "[Offload] Introduce offload-tblgen and initial new API implementation (#108413 )" This reverts commit 8a2311c4bf9993230e37dc20b57973dc917f2338.	2024-11-25 12:16:46 -06:00
Callum Fare	8a2311c4bf	[Offload] Introduce offload-tblgen and initial new API implementation (#108413 ) Introduce `offload-tblgen` and an initial implementation of a subset of the new API. The tablegen files are intended to be the single source of truth for the new API, with the header files, documentation, and others bits of source all automatically generated. TODO (based on review feedback so far): - [x] Check in the generated headers - [x] Add an `offload-generate` target to trigger the generation rather than building them every time - [x] Decide how error handling should work - [x] Finish up new error handling implementation - [x] Decide naming convention - [x] Add testing for the new API - [x] Add tablegen specific testing - [x] clang-tidy and use llvm:: types when possible - [x] Add optional code location arguments - [x] Avoid multiple returns from one function ### offload-tblgen See the included [README](`d80db06491/offload/new-api/API/README.md`) for more information on how the API definition and generation works. I'm happy to answer any questions about it and plan to walk through it in a future LLVM Offload call. It should be noted that struct definitions have not been fully implemented/tested as they aren't used by the initial API definitions, but finishing that off in the future shouldn't be too much work. The tablegen tooling has been designed to be easily extended with new backends, using the classes in `RecordTypes.hpp` to abstract over the tablegen records. ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh $ git clone -b offload_adapter https://github.com/callumfare/unified-runtime.git $ cd unified-runtime $ mkdir build $ cd build $ cmake .. -GNinja -DUR_BUILD_ADAPTER_OFFLOAD=ON \ -DUR_OFFLOAD_INSTALL_DIR=<offload build dir containing liboffload_new.so> \ -DUR_OFFLOAD_INCLUDE_DIR=<offload build dir containing 'offload' headers directory> $ ninja urinfo export LD_LIBRARY_PATH=<offload build dir containing offload plugin libraries> $ UR_ADAPTERS_FORCE_LOAD=$PWD/lib/libur_adapter_offload.so ./bin/urinfo [cuda:gpu][cuda:0] CUDA, NVIDIA GeForce GT 1030 [12030] # Demo with tracing $ OFFLOAD_TRACE=1 UR_ADAPTERS_FORCE_LOAD=$PWD/lib/libur_adapter_offload.so ./bin/urinfo ---> offloadPlatformGet(.NumEntries = 0, .phPlatforms = {}, .pNumPlatforms = 0x7ffd05e4d6e0 (2))-> OFFLOAD_RESULT_SUCCESS ---> offloadPlatformGet(.NumEntries = 2, .phPlatforms = {0x564bf4040220, 0x564bf4040240}, .pNumPlatforms = nullptr)-> OFFLOAD_RESULT_SUCCESS ... ``` ### Open questions and future work * The new API is implemented in a separate library (`liboffload_new.so`). It could just as easily be part of the existing `libomptarget` library - I have no strong feelings on which is better. * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.	2024-11-25 11:34:14 -06:00
Joseph Huber	506ca19dc9	[OpenMP] Remove use of '__AMDGCN_WAVEFRONT_SIZE' (#113156 ) Summary: This is going to be deprecated in https://github.com/llvm/llvm-project/pull/112849. This patch ports it to use the builtin instead. This isn't a compile constant, so it could slightly negatively affect codegen. There really should be an IR pass to turn it into a constant if the function has known attributes. Using the builtin is correct when we just do it for knowing the size like we do here. Obviously guarding w32/w64 code with this check would be broken.	2024-11-25 07:38:28 -06:00
agozillon	3723449955	[OpenMP] Allocatable explicit member mapping fortran offloading tests (#113555 ) This PR is one in a series of 3 that aim to add support for explicit member mapping of allocatable components in derived types within OpenMP+Fortran for Flang. This PR provides all of the runtime tests that are currently upstreamable, unfortunately some of the other tests would require linking of the fortran runtime for offload which we currently do not do. But regardless, this is plenty to ensure that the mapping is working in most cases.	2024-11-16 12:22:33 +01:00

1 2 3

115 Commits