llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-24 00:46:06 +00:00

Author	SHA1	Message	Date
agozillon	f1178815d2	[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for Flang maps (#129586 ) This PR adds an initial implementation for the map modifiers close, present and ompx_hold, primarily just required adding the appropriate map type flags to the map type bits. In the case of ompx_hold it required adding the map type to the OpenMP dialect. Close has a bit of a problem when utilised with the ALWAYS map type on descriptors, so it is likely we'll have to make sure close and always are not applied to the descriptor simultaneously in the future when we apply always to the descriptors to facilitate movement of descriptor information to device for consistency, however, we may find an alternative to this with further investigation. For the moment, it is a TODO/Note to keep track of it.	2025-03-07 22:22:30 +01:00
Michael Kruse	e296fb8ff6	Revert "[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-*" (#130274 ) Reverts llvm/llvm-project#129692 The builder amdgpu-offload-rhel-8-cmake-build-only fails because its version of Ninja is too old. At least Ninja 1.10 is required for its support for dependencies between Fortran modules. https://lab.llvm.org/buildbot/#/builders/204/builds/2696	2025-03-07 12:30:18 +01:00
Michael Kruse	68578b38cf	[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-* (#129692 ) Enable the LLVM_ENABLE_RUNTIMES=flang-rt build of the Fortran runtime for the amdgpu-offload-* buildbots. This pre-population cmake cache files is referred to by the llvm-zorg annotated builder factory [script](`872f477610/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py (L26)`). The corresponding change in llvm-zorg is https://github.com/llvm/llvm-zorg/pull/402	2025-03-07 11:56:00 +01:00
Nikita Popov	4f469ae046	[offload] Fix build after Module::getTargetTriple() change Adjust for #129868.	2025-03-06 11:04:00 +01:00
Nikita Popov	979c275097	[IR] Store Triple in Module (NFC) (#129868 ) The module currently stores the target triple as a string. This means that any code that wants to actually use the triple first has to instantiate a Triple, which is somewhat expensive. The change in #121652 caused a moderate compile-time regression due to this. While it would be easy enough to work around, I think that architecturally, it makes more sense to store the parsed Triple in the module, so that it can always be directly queried. For this change, I've opted not to add any magic conversions between std::string and Triple for backwards-compatibilty purses, and instead write out needed Triple()s or str()s explicitly. This is because I think a decent number of them should be changed to work on Triple as well, to avoid unnecessary conversions back and forth. The only interesting part in this patch is that the default triple is Triple("") instead of Triple() to preserve existing behavior. The former defaults to using the ELF object format instead of unknown object format. We should fix that as well.	2025-03-06 10:27:47 +01:00
Alex	b8a66f50b4	[OFFLOAD] Update ffi_cif structure to match libffi (#128756 ) The ffi_cif structure defined in the wrapper header is smaller than the actual structure in libffi which results in other structures being overwritten when libffi is called, and finally in a segfault. The patch updates the structure to the correct layout as specified in ffi.h	2025-03-04 11:40:12 -06:00
Jan Patrick Lehr	fe18796142	[Offload][AMDGPU] Enable SPIRV target in build conf (#129323 ) Enable the SPIRV backend on the CMake-cache file buildbots.	2025-03-01 21:56:28 +01:00
Jan Patrick Lehr	1824bb47c2	[Offload][OpenMP] Fix check-prefix (#128599 )	2025-02-25 00:32:27 +01:00
Zequan Wu	1b15a89a23	Revert "[Offload] Fix assumptions on symbols after #124846 (#126238 )" The dependency commit was reverted at `23aca2f88d`. Reverting this as well.	2025-02-24 13:30:54 -08:00
Jan Patrick Lehr	17ccaf4fa8	[NFC][Offload] Fix typo to output architecture (#128527 )	2025-02-24 16:54:21 +01:00
Fabian Ritter	a2f9ae1421	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (#125826 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633	2025-02-19 09:56:04 +01:00
Akash Banerjee	785a5b4676	[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746 ) This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare Mapper directive. Since both MLIR and Clang now support custom mappers, I've changed the respective function params to no longer be optional as well. Depends on #121005	2025-02-18 17:55:48 +00:00
Krzysztof Parzyszek	7b89c41e41	[offload] Remove redundant checks in MappingInfoTy::lookupMapping (#127638 ) Also add some clarifying comments.	2025-02-18 11:01:36 -06:00
Joseph Huber	1435c8ed95	Reapply "[LinkerWrapper] Clean up options after proper forwarding" (#126495 ) Summary: The test failed because it no longer passed Rpass by default without LTO. I think that's desirable as it matches the standard behavior. This reverts commit 6fd99de31864a5ef84ae8613b3a9034e05293461.	2025-02-14 09:56:46 -06:00
Ethan Luis McDonough	52ee06d273	[PGO][Offload] Fix pgo1.c (#126864 ) pgo1.c had outdated test checks	2025-02-12 00:54:31 -06:00
Ethan Luis McDonough	9e5c136d5a	[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365 ) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set	2025-02-11 23:30:54 -06:00
Joseph Huber	baf7a3c1e5	[Offload] Properly guard modifications to the RPC device array (#126790 ) Summary: If the user deallocates an RPC device this can sometimes fail if the RPC server is still running. This will happen if the modification happens while the server is still checking it. This patch adds a mutex to guard modifications to it.	2025-02-11 14:57:31 -06:00
Joseph Huber	a854c266b9	[Offload][NFC] Rename `src/` -> `libomptarget/` (#126573 ) Summary: The name `src` is confusing when combined with the plugins and the newly added `liboffload`.	2025-02-10 13:22:10 -06:00
Joseph Huber	feb30f25c0	[Offload] Fix the offload cache file triggering libc++ / libstdc++ mixing (#126313 ) Summary: We originally wanted `-stdlib=libc++` by default so that it could use offloading support in libc++, however this causes issues with out the Offloading proejct itself is built. Is the user builds the LLVM libs with libstdc++ then uses this cache it will enable this option by default for the ensuing build of the offloading libraries with the newly build clang. This will cause a lot of linker failured because the C++ library doesn't match. Long term I think the proper solution to this is to make better use of clang configuration files, but I don't know a good way to do that by default. For now just make it build right.	2025-02-10 13:20:35 -06:00
Joseph Huber	ed9107f2d7	[OpenMP] Replace use of target address space with <gpuintrin.h> local (#126119 ) Summary: This definition is more portable since it defines the correct value for the target. I got rid of the helper mostly because I think it's easy enough to use now that it's a type and being explicit about what's `undef` or `poison` is good.	2025-02-09 10:25:25 -06:00
Jan Patrick Lehr	191d7d64e5	[Offload] Fix assumptions on symbols after #124846 (#126238 ) In #124846 the symbolizer was changed to ignore 0-column entries, which lead to a slightly different representation in the stack traces. This patch addresses these differences. Not sure if the difference in kernel_trap.c is also a result of this change or not. Can be tracked separate from this, after the bots are back to green.	2025-02-07 13:25:11 +01:00
David Blaikie	14d6e1ebf5	Update test for symbolizer fix	2025-02-06 19:18:20 +00:00
Joseph Huber	5812d0bf8e	[Offload] Make only a single thread handle the RPC server thread (#126067 ) Summary: This patch just changes the interface to make starting the thread multiple times permissable since it will only be done the first time. Note that this does not refcount it or anything, so it's onto the user to make sure that they don't shut down the thread before everyone is done using it. That is the case today because the shutDown portion is run by a single thread in the destructor phase. Another question is if we should make this thread truly global state, because currently it will be private to each plugin instance, so if you have an AMD and NVIDIA image there will be two, similarly if you have those inside of a shared library.	2025-02-06 11:38:14 -06:00
Joseph Huber	f1e917d07b	[Offload] Unify offloading entries into a single section (#125731 ) Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.	2025-02-06 08:24:01 -06:00
Joseph Huber	7a8779422d	[Offload] Stop the RPC server faiilng with more than one GPU (#125982 ) Summary: Pretty dumb mistake of me, forgot that this is run per-device and per-plugin, which fell through the cracks with my testing because I have two GPUs that use different plugins.	2025-02-05 20:51:28 -06:00
Joseph Huber	bb7ab2557c	[OpenMP] Port the OpenMP device runtime to direct C++ compilation (#123673 ) Summary: This removes the use of OpenMP offloading to build the device runtime. The main benefit here is that we no longer need to rely on offloading semantics to build a device only runtime. Things like variants are now no longer needed and can just be simple if-defs. In the future, I will remove most of the special handling here and fold it into calls to the `<gpuintrin.h>` functions instead. Additionally I will rework the compilation to make this a separate runtime. The current plan is to have this, but make including OpenMP and offloading either automatically add it, or print a warning if it's missing. This will allow us to use a normal CMake workflow and delete all the weird 'lets pull the clang binary out of the build' business. ``` -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=offload -DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa ``` After that, linking the OpenMP device runtime will be `-Xoffload-linker -lomp`. I.e. no more fat binary business. Only look at the most recent commit since this includes the two dependencies (fix to AMDGPUEmitPrintfBinding and the PointerToMember bug).	2025-02-05 08:18:52 -06:00
Joseph Huber	a284a6ed17	[OpenMP] Guard OpenMP specific entry handling	2025-02-03 16:16:18 -06:00
Michał Górny	689ef5fda0	[offload] [test] Use test compiler ID rather than host (#124408 ) Use the test compiler ID to verify whether tests can be run rather than the host compiler. This makes it possible to run tests (with Clang) while the library itself was built with GCC.	2025-02-02 15:55:39 +00:00
Michał Górny	359a913170	[offload] `gnu::format` with variadic template functions is Clang-only (#124406 ) Use `gnu::format` attribute only when compiling with Clang, as using it against variadic template functions is a Clang extension and is not supported by GCC. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77958 Fixes #119069	2025-02-02 15:55:22 +00:00
Christian Clauss	1f56bb3137	[Offload][NFC] Fix typos discovered by codespell (#125119 ) https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`	2025-01-31 09:35:29 -06:00
agozillon	2428b6ec40	[Flang][MLIR][OpenMP] Fix Target Data if (present(...)) causing LLVM-IR branching error (#123771 ) Currently if we generate code for the below target data map that uses an optional mapping: !$omp target data if(present(a)) map(alloc:a) do i = 1, 10 a(i) = i end do !$omp end target data We yield an LLVM-IR error as the branch for the else path is not generated. This occurs because we enter the NoDupPriv path of the call back function when generating the else branch, however, the emitBranch function needs to be set to a block for it to functionally generate and link in a follow up branch. The NoDupPriv path currently doesn't do this, while it's not supposed to generate anything (as far as I am aware) we still need to at least set the builders placement back so that it emits the appropriate follow up branch. This avoids the missing terminator LLVM-IR verification error by correctly generating the follow up branch.	2025-01-30 17:33:36 +01:00
agozillon	e0054e984c	[MLIR][OpenMP] Emit nullary check for mapped pointer members and appropriate size select based on results (#124604 ) This PR aims to fix a mapping error when trying to map nullary elements of a record type (primary example is allocatables/pointer types in Fortran at the moment). This should be legal to map, just not write to without pointing to anything within the target region. A common Fortran OpenMP idiom/example where this is useful can be found in the added Fortran offload example. The runtime error arises when we try to map the pointer member utilising a prescribed constant size that we receive from the lowered type, resulting in mapping of data that will be non-existent when there is no allocated data. The fix in this case is to emit a runtime check to see if the data has been allocated, if it hasn't been we select a size of 0, if it has we emit the usual type size.	2025-01-29 17:51:33 +01:00
Jan Patrick Lehr	d412fe531d	[Offload] Enable mlir and flang in bot build (#124915 ) This enables more projects in the CMake cache to add them to the buildbot coverage in the AMDGPU buildbots.	2025-01-29 14:13:59 +01:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Joseph Huber	760a786d15	[Clang] Prevent `mlink-builtin-bitcode` from internalizing the RPC client (#118661 ) Summary: Currently, we only use `-mlink-builtin-bitcode` for non-LTO NVIDIA compiliations. This has the problem that it will internalize the RPC client symbol which needs to be visible to the host. To counteract that, I put `retain` on it, but this also prevents optimizations on the global itself, so the passes we have that remove the symbol don't work on OpenMP anymore. This patch does the dumbest solution, adding a special string check for it in clang. Not the best solution, the runner up would be to have a clang attribute for `externally_initialized` because those can't be internalized, but that might have some unfortunate side-effects. Alternatively we could make NVIDIA compilations do LTO all the time, but that would affect some users and it's harder than I thought.	2025-01-27 19:30:59 -06:00
Joseph Huber	38b3f45a81	[Offload] Fix offload-info interface Summary: The offload info tool doesn't initialize things properly, just check this first instead.	2025-01-27 10:36:09 -06:00
Joseph Huber	f07505849c	[Offload] Fix server thread from being shut down if unused	2025-01-27 08:29:41 -06:00
Joseph Huber	e7592d83e0	[Offload][NFC] Make sure the thread is not running already	2025-01-27 08:06:29 -06:00
Joseph Huber	bd8a818128	[Offload] Add cuLaunchHostFunc to dynamic cuda Summary: This was missing, causing non-directly linked builds to fail.	2025-01-24 11:41:20 -06:00
Joseph Huber	134401deea	[Offload] Move RPC server handling to a dedicated thread (#112988 ) Summary: Handling the RPC server requires running through list of jobs that the device has requested to be done. Currently this is handled by the thread that does the waiting for the kernel to finish. However, this is not sound on NVIDIA architectures and only works for async launches in the OpenMP model that uses helper threads. However, we also don't want to have this thread doing work unnnecessarily. For this reason we track the execution of kernels and cause the thread to sleep via a condition variable (usually backed by some kind of futex or other intelligent sleeping mechanism) so that the thread will be idle while no kernels are running.	2025-01-24 11:36:45 -06:00
hidekisaito	ed512710a5	[Offload] Make MemoryManager threshold ENV var size_t type. (#124063 )	2025-01-23 11:46:56 -06:00
Joseph Huber	6518b121f0	[Offload][NFC] Factor out and rename the `__tgt_offload_entry` struct (#123785 ) Summary: This patch is an NFC renaming to make using the offloading entry type more portable between other targets. Right now this is just moving its definition to LLVM so others can use it. Future work will rework the struct layout.	2025-01-21 12:05:24 -06:00
Joseph Huber	f233a54ae8	[OpenMP] Remove usage of pointer-to-member in lookup (#123671 ) Summary: This is buggy and is currently being tracked in https://github.com/llvm/llvm-project/issues/123241. For now, replace it with a macro so that we can use address spaces directly.	2025-01-21 07:50:40 -06:00
Joseph Huber	3274bf6b42	[OpenMP] Make each atomic helper take an atomic scope argument (#122786 ) Summary: Right now we just default to device for each type, and mix an ad-hoc scope with the one used by the compiler's builtins. Unify this can make each version take the scope optionally. For @ronlieb, this will remove the need for `add_system` in the fork as well as the extra `cas` with system scope, just pass `system`.	2025-01-20 21:58:27 -06:00
Joseph Huber	2d9f406943	[OpenMP] Adjust 'printf' handling in the OpenMP runtime (#123670 ) Summary: We used to avoid a lot of this stuff because we didn't properly handle variadics in device code. That's been solved for now, so we can just make an internal printf handler that forwards to the external `vprintf` function. This is either provided by NVIDIA's SDK or by the GPU libc implementation. The main reason for doing this is because it prevents the stupid AMDGPU printf pass from mangling our beautiful printfs!	2025-01-20 21:56:46 -06:00
Joseph Huber	723a3e746a	[OpenMP] Fix mispelled attribute and warning Summary: This is spelled `ompx_aligned_barrier` when used directly, but wasn't included in the list of known assumptions. Fix that so now th test works.	2025-01-20 08:40:19 -06:00
Joseph Huber	58af82b462	[OpenMP] Remove 'omp assumes' scopes now that we have no inline ASM (#123611 ) Summary: We used this globally scoped `ext_no_call_asm` as a sort of hack around the compiler that allowed the attributor to optimize out inline assembly calls to PTX instructions. Quite some time ago I got rid of every inline assembly call and replaced it with a builitin, so this can just be deleted. Furthermore, I use the `[[omp::assume]]` attribute directly for the aligned barrier usage. This prints an unknown assumption warning (even though it isn't) so I'm just silencing that for now until I fix it later. --------- Co-authored-by: Michael Kruse <github@meinersbur.de>	2025-01-20 08:11:06 -06:00
Jan Patrick Lehr	4b3c17850b	[Offload] Enable shared-libs; compiler-rt as default RTLIB (#123568 ) This is the next step to move the CMake cache file builder closer to the build configuration we care about downstream.	2025-01-20 10:23:41 +01:00
Jan Patrick Lehr	61f94ebc9e	[NFC][Offload] Structure/Readability of CMake cache (#123328 ) Preparing to add more config options and want to group them all from most-common to project / component specific.	2025-01-17 13:01:25 +01:00
Joseph Huber	1c00d0d776	[OpenMP] Remove hack around missing atomic load (#122781 ) Summary: We used to do a fetch add of zero to approximate a load. This is because the NVPTX backend didn't handle this properly. It's not an issue anymore so simply use the proper atomic builtin.	2025-01-16 15:17:15 -06:00

1 2 3 4 5

227 Commits