llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-16 07:06:37 +00:00

Author	SHA1	Message	Date
Joseph Huber	772173f548	[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870 ) Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in https://github.com/llvm/llvm-project/pull/131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.	2025-03-28 07:35:16 -05:00
macurtis-amd	21a8c63cdc	[offload] Remove bad assert in StaticLoopChunker::Distribute (#132705 ) When building with asserts enabled, this can actually cause strange miscompilations because an incorrect llvm.assume is generated at the point of the assertion.	2025-03-28 04:53:00 -05:00
Joseph Huber	75f810e025	[Offload] Guard HSA implicit arguments if they aren't created (#133073 ) Summary: We conditionally allocate the implicit arguments, so they possibly are null. The flang compiler seems to hit this case, even though it shouldn't when it's supposed to conform to the HSA code object. For now guard this to fix the regression and cover a case in the future where someone rolls a fully custom implementatation. Fixes: https://github.com/llvm/llvm-project/issues/132982	2025-03-26 08:54:33 -05:00
Joseph Huber	25bf4e262c	[Offload] Remove handling for COV4 binaries from offload/ (#131033 ) Summary: We moved from cov4 to cov5 a long time ago, and it guards simplifying some front end code, so we should be able to move up with this.	2025-03-24 18:58:20 -05:00
Ethan Luis McDonough	c50d39f073	[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268 ) This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on https://github.com/llvm/llvm-project/pull/93365. This PR makes the following changes: - Allows PGO flags to be supplied to GPU targets - Pulls version global from device - Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to allow the PGO version to be overridden	2025-03-19 19:01:38 -05:00
Joseph Huber	cb493d2bab	[OpenMP] Replace utilities with 'gpuintrin.h' definitions (#131644 ) Summary: Port more instructions. AMD version is at https://gist.github.com/jhuber6/235d7ee95f747c75f9a3cfd8eedac6aa	2025-03-19 10:47:21 -05:00
Jon Chesterfield	deb0f3c09b	[openmp][nfc] Use builtin align in the devicertl (#131918 ) Noticed while extracting the smartstack as a test case	2025-03-18 21:31:49 +00:00
Jon Chesterfield	395bdebebd	Revert "[openmp][nfc] Refactor shared/lds smartstack for spirv (#131905 )" This reverts commit c02b935a9be888bbdf9f8cb0bf980bd411ae5893. Failed a check-offload test under CI	2025-03-18 20:43:05 +00:00
Joseph Huber	206f78dfec	[OpenMP] Use 'gpuintrin.h' definitions for simple block identifiers (#131631 ) Summary: This patch ports the runtime to use `gpuintrin.h` instead of calling the builtins for most things. The `lanemask_gt` stuff was left for now with a fallback. AMD version for Ron https://gist.github.com/jhuber6/42014d635b9a8158727640876bf47226.	2025-03-18 15:38:46 -05:00
Jon Chesterfield	c02b935a9b	[openmp][nfc] Refactor shared/lds smartstack for spirv (#131905 ) Spirv doesn't have implicit conversions between address spaces (at least at present, we might need to change that) and address space qualified *this pointers are not handled well by clang. This commit changes the single instance of the smartstack to be explicitly a singleton, for fractionally simpler IR generation (no this pointer) and to sidestep the work in progress spirv64-- openmp target not being able to compile the original version.	2025-03-18 20:33:24 +00:00
Joseph Huber	8437b7f558	[libc] Make RPC server handling header only (#131205 ) Summary: This patch moves the RPC server handling to be a header only utility stored in the `shared/` directory. This is intended to be shared within LLVM for the loaders and `offload/` handling. Generally, this makes it easier to share code without weird cross-project binaries being plucked out of the build system. It also allows us to soon move the loader interface out of the `libc` project so that we don't need to bootstrap those and can build them in LLVM.	2025-03-13 19:23:21 -05:00
Michael Kruse	d3255474be	Reapply "[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-" (#130274 ) Enable the LLVM_ENABLE_RUNTIMES=flang-rt build of the Fortran runtime for the amdgpu-offload- buildbots. This pre-population cmake cache files is referred to by the llvm-zorg annotated builder factory [script](`872f477610/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py (L26)`). The corresponding change in llvm-zorg is llvm/llvm-zorg#402 This reverts commit e296fb8ff6255b97db9ff6cd941acc730164b38f. The worker of amdgpu-offload-rhel-8-cmake-build-only has been updated with a newer version of Ninja that supports Fortran.	2025-03-13 13:21:36 +01:00
Krzysztof Parzyszek	f4fc2d731c	[flang][OpenMP] Map ByRef if size/alignment exceed that of a pointer (#130832 ) Improve the check for whether a type can be passed by copy. Currently, passing by copy is done via the OMP_MAP_LITERAL mapping, which can only transfer as much data as can be contained in a pointer representation.	2025-03-12 19:41:11 -05:00
Nikita Popov	f137c3d592	[TargetRegistry] Accept Triple in createTargetMachine() (NFC) (#130940 ) This avoids doing a Triple -> std::string -> Triple round trip in lots of places, now that the Module stores a Triple.	2025-03-12 17:35:09 +01:00
Krzysztof Parzyszek	d67947162f	[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568 ) The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, `has_device_addr(x)` means that (in C/C++ terms) `&x` is a device address. When entering a target region, `x` does not need to be allocated on the device, or have its contents copied over (in the absence of additional mapping clauses). Passing its address verbatim to the region for use is sufficient, and is the intended goal of the clause. Some Fortran objects use descriptors in their in-memory representation. If `x` had a descriptor, both the descriptor and the contents of `x` would be located in the device memory. However, the descriptors are managed by the compiler, and can be regenerated at various points as needed. The address of the effective descriptor may change, hence it's not safe to pass the address of the descriptor to the target region. Instead, the descriptor itself is always copied, but for objects like `x`, no further mapping takes place (as this keeps the storage pointer in the descriptor unchanged). --------- Co-authored-by: Sergio Afonso <safonsof@amd.com>	2025-03-10 08:11:01 -05:00
agozillon	f1178815d2	[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for Flang maps (#129586 ) This PR adds an initial implementation for the map modifiers close, present and ompx_hold, primarily just required adding the appropriate map type flags to the map type bits. In the case of ompx_hold it required adding the map type to the OpenMP dialect. Close has a bit of a problem when utilised with the ALWAYS map type on descriptors, so it is likely we'll have to make sure close and always are not applied to the descriptor simultaneously in the future when we apply always to the descriptors to facilitate movement of descriptor information to device for consistency, however, we may find an alternative to this with further investigation. For the moment, it is a TODO/Note to keep track of it.	2025-03-07 22:22:30 +01:00
Michael Kruse	e296fb8ff6	Revert "[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-*" (#130274 ) Reverts llvm/llvm-project#129692 The builder amdgpu-offload-rhel-8-cmake-build-only fails because its version of Ninja is too old. At least Ninja 1.10 is required for its support for dependencies between Fortran modules. https://lab.llvm.org/buildbot/#/builders/204/builds/2696	2025-03-07 12:30:18 +01:00
Michael Kruse	68578b38cf	[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-* (#129692 ) Enable the LLVM_ENABLE_RUNTIMES=flang-rt build of the Fortran runtime for the amdgpu-offload-* buildbots. This pre-population cmake cache files is referred to by the llvm-zorg annotated builder factory [script](`872f477610/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py (L26)`). The corresponding change in llvm-zorg is https://github.com/llvm/llvm-zorg/pull/402	2025-03-07 11:56:00 +01:00
Nikita Popov	4f469ae046	[offload] Fix build after Module::getTargetTriple() change Adjust for #129868.	2025-03-06 11:04:00 +01:00
Nikita Popov	979c275097	[IR] Store Triple in Module (NFC) (#129868 ) The module currently stores the target triple as a string. This means that any code that wants to actually use the triple first has to instantiate a Triple, which is somewhat expensive. The change in #121652 caused a moderate compile-time regression due to this. While it would be easy enough to work around, I think that architecturally, it makes more sense to store the parsed Triple in the module, so that it can always be directly queried. For this change, I've opted not to add any magic conversions between std::string and Triple for backwards-compatibilty purses, and instead write out needed Triple()s or str()s explicitly. This is because I think a decent number of them should be changed to work on Triple as well, to avoid unnecessary conversions back and forth. The only interesting part in this patch is that the default triple is Triple("") instead of Triple() to preserve existing behavior. The former defaults to using the ELF object format instead of unknown object format. We should fix that as well.	2025-03-06 10:27:47 +01:00
Alex	b8a66f50b4	[OFFLOAD] Update ffi_cif structure to match libffi (#128756 ) The ffi_cif structure defined in the wrapper header is smaller than the actual structure in libffi which results in other structures being overwritten when libffi is called, and finally in a segfault. The patch updates the structure to the correct layout as specified in ffi.h	2025-03-04 11:40:12 -06:00
Jan Patrick Lehr	fe18796142	[Offload][AMDGPU] Enable SPIRV target in build conf (#129323 ) Enable the SPIRV backend on the CMake-cache file buildbots.	2025-03-01 21:56:28 +01:00
Jan Patrick Lehr	1824bb47c2	[Offload][OpenMP] Fix check-prefix (#128599 )	2025-02-25 00:32:27 +01:00
Zequan Wu	1b15a89a23	Revert "[Offload] Fix assumptions on symbols after #124846 (#126238 )" The dependency commit was reverted at `23aca2f88d`. Reverting this as well.	2025-02-24 13:30:54 -08:00
Jan Patrick Lehr	17ccaf4fa8	[NFC][Offload] Fix typo to output architecture (#128527 )	2025-02-24 16:54:21 +01:00
Fabian Ritter	a2f9ae1421	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (#125826 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633	2025-02-19 09:56:04 +01:00
Akash Banerjee	785a5b4676	[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746 ) This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare Mapper directive. Since both MLIR and Clang now support custom mappers, I've changed the respective function params to no longer be optional as well. Depends on #121005	2025-02-18 17:55:48 +00:00
Krzysztof Parzyszek	7b89c41e41	[offload] Remove redundant checks in MappingInfoTy::lookupMapping (#127638 ) Also add some clarifying comments.	2025-02-18 11:01:36 -06:00
Joseph Huber	1435c8ed95	Reapply "[LinkerWrapper] Clean up options after proper forwarding" (#126495 ) Summary: The test failed because it no longer passed Rpass by default without LTO. I think that's desirable as it matches the standard behavior. This reverts commit 6fd99de31864a5ef84ae8613b3a9034e05293461.	2025-02-14 09:56:46 -06:00
Ethan Luis McDonough	52ee06d273	[PGO][Offload] Fix pgo1.c (#126864 ) pgo1.c had outdated test checks	2025-02-12 00:54:31 -06:00
Ethan Luis McDonough	9e5c136d5a	[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365 ) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set	2025-02-11 23:30:54 -06:00
Joseph Huber	baf7a3c1e5	[Offload] Properly guard modifications to the RPC device array (#126790 ) Summary: If the user deallocates an RPC device this can sometimes fail if the RPC server is still running. This will happen if the modification happens while the server is still checking it. This patch adds a mutex to guard modifications to it.	2025-02-11 14:57:31 -06:00
Joseph Huber	a854c266b9	[Offload][NFC] Rename `src/` -> `libomptarget/` (#126573 ) Summary: The name `src` is confusing when combined with the plugins and the newly added `liboffload`.	2025-02-10 13:22:10 -06:00
Joseph Huber	feb30f25c0	[Offload] Fix the offload cache file triggering libc++ / libstdc++ mixing (#126313 ) Summary: We originally wanted `-stdlib=libc++` by default so that it could use offloading support in libc++, however this causes issues with out the Offloading proejct itself is built. Is the user builds the LLVM libs with libstdc++ then uses this cache it will enable this option by default for the ensuing build of the offloading libraries with the newly build clang. This will cause a lot of linker failured because the C++ library doesn't match. Long term I think the proper solution to this is to make better use of clang configuration files, but I don't know a good way to do that by default. For now just make it build right.	2025-02-10 13:20:35 -06:00
Joseph Huber	ed9107f2d7	[OpenMP] Replace use of target address space with <gpuintrin.h> local (#126119 ) Summary: This definition is more portable since it defines the correct value for the target. I got rid of the helper mostly because I think it's easy enough to use now that it's a type and being explicit about what's `undef` or `poison` is good.	2025-02-09 10:25:25 -06:00
Jan Patrick Lehr	191d7d64e5	[Offload] Fix assumptions on symbols after #124846 (#126238 ) In #124846 the symbolizer was changed to ignore 0-column entries, which lead to a slightly different representation in the stack traces. This patch addresses these differences. Not sure if the difference in kernel_trap.c is also a result of this change or not. Can be tracked separate from this, after the bots are back to green.	2025-02-07 13:25:11 +01:00
David Blaikie	14d6e1ebf5	Update test for symbolizer fix	2025-02-06 19:18:20 +00:00
Joseph Huber	5812d0bf8e	[Offload] Make only a single thread handle the RPC server thread (#126067 ) Summary: This patch just changes the interface to make starting the thread multiple times permissable since it will only be done the first time. Note that this does not refcount it or anything, so it's onto the user to make sure that they don't shut down the thread before everyone is done using it. That is the case today because the shutDown portion is run by a single thread in the destructor phase. Another question is if we should make this thread truly global state, because currently it will be private to each plugin instance, so if you have an AMD and NVIDIA image there will be two, similarly if you have those inside of a shared library.	2025-02-06 11:38:14 -06:00
Joseph Huber	f1e917d07b	[Offload] Unify offloading entries into a single section (#125731 ) Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.	2025-02-06 08:24:01 -06:00
Joseph Huber	7a8779422d	[Offload] Stop the RPC server faiilng with more than one GPU (#125982 ) Summary: Pretty dumb mistake of me, forgot that this is run per-device and per-plugin, which fell through the cracks with my testing because I have two GPUs that use different plugins.	2025-02-05 20:51:28 -06:00
Joseph Huber	bb7ab2557c	[OpenMP] Port the OpenMP device runtime to direct C++ compilation (#123673 ) Summary: This removes the use of OpenMP offloading to build the device runtime. The main benefit here is that we no longer need to rely on offloading semantics to build a device only runtime. Things like variants are now no longer needed and can just be simple if-defs. In the future, I will remove most of the special handling here and fold it into calls to the `<gpuintrin.h>` functions instead. Additionally I will rework the compilation to make this a separate runtime. The current plan is to have this, but make including OpenMP and offloading either automatically add it, or print a warning if it's missing. This will allow us to use a normal CMake workflow and delete all the weird 'lets pull the clang binary out of the build' business. ``` -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=offload -DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa ``` After that, linking the OpenMP device runtime will be `-Xoffload-linker -lomp`. I.e. no more fat binary business. Only look at the most recent commit since this includes the two dependencies (fix to AMDGPUEmitPrintfBinding and the PointerToMember bug).	2025-02-05 08:18:52 -06:00
Joseph Huber	a284a6ed17	[OpenMP] Guard OpenMP specific entry handling	2025-02-03 16:16:18 -06:00
Michał Górny	689ef5fda0	[offload] [test] Use test compiler ID rather than host (#124408 ) Use the test compiler ID to verify whether tests can be run rather than the host compiler. This makes it possible to run tests (with Clang) while the library itself was built with GCC.	2025-02-02 15:55:39 +00:00
Michał Górny	359a913170	[offload] `gnu::format` with variadic template functions is Clang-only (#124406 ) Use `gnu::format` attribute only when compiling with Clang, as using it against variadic template functions is a Clang extension and is not supported by GCC. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77958 Fixes #119069	2025-02-02 15:55:22 +00:00
Christian Clauss	1f56bb3137	[Offload][NFC] Fix typos discovered by codespell (#125119 ) https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`	2025-01-31 09:35:29 -06:00
agozillon	2428b6ec40	[Flang][MLIR][OpenMP] Fix Target Data if (present(...)) causing LLVM-IR branching error (#123771 ) Currently if we generate code for the below target data map that uses an optional mapping: !$omp target data if(present(a)) map(alloc:a) do i = 1, 10 a(i) = i end do !$omp end target data We yield an LLVM-IR error as the branch for the else path is not generated. This occurs because we enter the NoDupPriv path of the call back function when generating the else branch, however, the emitBranch function needs to be set to a block for it to functionally generate and link in a follow up branch. The NoDupPriv path currently doesn't do this, while it's not supposed to generate anything (as far as I am aware) we still need to at least set the builders placement back so that it emits the appropriate follow up branch. This avoids the missing terminator LLVM-IR verification error by correctly generating the follow up branch.	2025-01-30 17:33:36 +01:00
agozillon	e0054e984c	[MLIR][OpenMP] Emit nullary check for mapped pointer members and appropriate size select based on results (#124604 ) This PR aims to fix a mapping error when trying to map nullary elements of a record type (primary example is allocatables/pointer types in Fortran at the moment). This should be legal to map, just not write to without pointing to anything within the target region. A common Fortran OpenMP idiom/example where this is useful can be found in the added Fortran offload example. The runtime error arises when we try to map the pointer member utilising a prescribed constant size that we receive from the lowered type, resulting in mapping of data that will be non-existent when there is no allocated data. The fix in this case is to emit a runtime check to see if the data has been allocated, if it hasn't been we select a size of 0, if it has we emit the usual type size.	2025-01-29 17:51:33 +01:00
Jan Patrick Lehr	d412fe531d	[Offload] Enable mlir and flang in bot build (#124915 ) This enables more projects in the CMake cache to add them to the buildbot coverage in the AMDGPU buildbots.	2025-01-29 14:13:59 +01:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Joseph Huber	760a786d15	[Clang] Prevent `mlink-builtin-bitcode` from internalizing the RPC client (#118661 ) Summary: Currently, we only use `-mlink-builtin-bitcode` for non-LTO NVIDIA compiliations. This has the problem that it will internalize the RPC client symbol which needs to be visible to the host. To counteract that, I put `retain` on it, but this also prevents optimizations on the global itself, so the passes we have that remove the symbol don't work on OpenMP anymore. This patch does the dumbest solution, adding a special string check for it in clang. Not the best solution, the runner up would be to have a clang attribute for `externally_initialized` because those can't be internalized, but that might have some unfortunate side-effects. Alternatively we could make NVIDIA compilations do LTO all the time, but that would affect some users and it's harder than I thought.	2025-01-27 19:30:59 -06:00

1 2 3 4 5

242 Commits