llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-24 00:46:06 +00:00

Author	SHA1	Message	Date
Joseph Huber	3c50cbfda4	[DeviceRTL] Make defined 'libc' functions weak in OpenMP (#97356 ) Summary: These functions provide special-case implementations internal to the OpenMP device runtime. This can potentially conflict with the symbols pulled in from the actual GPU `libc`. This patch makes these weak, so in the case that the GPU libc functions exist they will be overridden. This should not impact performance in the average case because the old `-mlink-builtin-bitcode` version does internalization, deleting weak, and the new LTO path will resolve to the strong reference and then internalize it.	2024-07-02 13:23:53 -05:00
dhruvachak	946f5d111d	[OpenMP] [OMPT] Callback registration should not depend on the device init callback. (#96371 ) Even if the device init callback is not registered, a tool should be allowed to register other callbacks.	2024-07-01 10:07:05 -07:00
Gheorghe-Teodor Bercea	1a478a69bc	[OpenMP][offload] Fix dynamic schedule tracking (#97065 ) This patch fixes the dynamic schedule tracking.	2024-07-01 10:23:11 -04:00
Ethan Luis McDonough	8823448807	[Offload] Refactor offload test requirements (#95196 ) Many tests in the `offload` project have requirements defined by which targets are not supported rather than which platforms are supported. This patch aims to streamline the requirement definitions by adding four new feature tags: `host`, `gpu`, `amdgpu`, and `nvidiagpu`.	2024-06-29 00:56:18 -05:00
Ethan Luis McDonough	2c8b912f63	Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587 )" This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.	2024-06-28 12:30:45 -05:00
Ethan Luis McDonough	5fd2af38e4	[PGO][OpenMP] Instrumentation for GPU devices (#76587 ) This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-06-28 10:42:19 -05:00
agozillon	aec735cf47	[Flang][OpenMP][MLIR] Fix common block mapping for regular and declare target link (#91829 ) This PR attempts to fix common block mapping for regular mapping of these types as well as when they have been marked as "declare target link". This PR should allow correct mapping of both the members of a common block and the full common block via its block symbol. The main changes were some adjustments to the Fortran OpenMP lowering to HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and adjustments to the way the we handle target kernel map argument rebinding inside of the OMPIRBuilder. For the Fortran OpenMP lowering were two changes, one to prevent the implicit capture of common block members when the common block symbol itself has been marked and the other creates intermediate member access inside of the target region to be used in-place of those external to the target region, this prevents external usages breaking the IsolatedFromAbove pact. In the latter case, there was an adjustment to the size calculation for types to better handle cases where we pass an array as the type of a map (as opposed to the bounds and the type of the element), which occurs in the case of common blocks. There is also some adjustment to how handleDeclareTargetMapVar handles renaming of declare target symbols in the module to the reference pointer, now it will only apply to those within the kernel that is currently being generated and we also perform a modification to replace constants with instructions as necessary as we cannot replace these with our reference pointer (non-constant and constants do not mix nicely). In the case of the OpenMPIRBuilder some changes were made to defer global symbol rebinding to kernel arguments until all other arguments have been rebound. This makes sure we do not replace uses that may refer to the global (e.g. a GEP) but are themselves actually a separate argument that needs bound. Currently "declare target to" still needs some work, but this may be the case for all types in conjunction with "declare target to" at the moment.	2024-06-25 20:54:04 +02:00
Joseph Huber	3de162fee4	[Offload] Fix using old deprecated CUDA root variable (#96307 ) Summary: This variable isn't being set properly since we moved to the new way to find the CUDA directory. That means this variable was just unset the whole time. This patch adds it in by calculating it using the binary directory so it can be passed to `--cuda-path`.	2024-06-21 10:08:05 -05:00
Wu Yingcong	89841137fb	[offload][cmake] always define pythonize_bool macro (#96028 ) I use the following cmake config to build offload and openmp ``` cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_PROJECTS="clang;openmp" -DLLVM_ENABLE_RUNTIMES="offload" -DLLVM_LIT_ARGS="-vv -a" -DLLVM_ENABLE_ASSERTIONS=ON ../llvm ``` and got the following error: ``` CMake Error at /tmp/build-llvm/llvm/offload/CMakeLists.txt:321 (pythonize_bool): Unknown CMake command "pythonize_bool". ``` After some search I find out that the "correct" way to build this is putting openmp and offload to the ENABLE_RUNTIMES like ``` cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="openmp;offload" -DLLVM_LIT_ARGS="-vv -a" -DLLVM_ENABLE_ASSERTIONS=ON ../llvm ``` . But since we don't forbid to config them using openmp as PROJECT and offload as RUNTIME, then we probably support it. The fix is to always define the pythonize_bool macro. For cmake, it is okay to redefine a macro, it does not cause a warning or else.	2024-06-20 07:00:19 -05:00
Joseph Huber	8043356380	[Offload] Change HSA header search order (#95769 ) Summary: The HSA headers existed previously in `include/hsa.h` and were moved to `include/hsa/hsa.h` in a later ROCm version. The include headers here were originally designed to favor a newer one. However, this unintentionally prevented the dyanmic HSA's `hsa.h` from being used if both were present. This patch changes the order so it will be found first. Related to https://github.com/llvm/llvm-project/pull/95484.	2024-06-17 14:52:50 -05:00
Tim Gymnich	597d2f7662	[OpenMP] Add Environment Variable to disable Reuse of Blocks for High Loop Trip Counts (#89239 ) Sometimes it might be beneficial to spawn more thread blocks instead of reusing existing for multiple loop iterations. Alternatives considered: Make `DefaultNumBlocks` settable via an environment variable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-06-14 07:35:23 -07:00
agozillon	0aeaa2d93d	[OMPIRBuilder][OpenMP][LLVM] Modify and use ReplaceConstant utility in convertTarget (#94541 ) This PR seeks to expand/replace the Constant -> Instruction conversion that needs to occur inside of the OpenMP Target kernel generation to allow kernel argument replacement of uses within the kernel (cannot replace constant uses within constant expressions with non-constants). It does so by making use of the new-ish utility convertUsersOfConstantsToInstructions which is a much more expansive version of what the smaller "version" of the function I wrote does, effectively expanding uses of the input argument that are constant expressions into instructions so that we can replace with the appropriate kernel argument. Also alters convertUsersOfConstantsToInstructions to optionally restrict the replacement to a function and optionally leave dead constants alone, the latter is necessary when lowering from MLIR as we cannot be sure we can remove the constants at this stage, even if rewritten to instructions the ModuleTranslation may maintain links to the original constants and utilise them in further lowering steps (as when we're lowering the kernel, the module is still in the process of being lowered). This can result in unusual ICEs later. These dead constants can be tidied up later (and appear to be in subsequent lowering from checking with emit-llvm).	2024-06-13 15:57:15 +02:00
Johannes Doerfert	54b5c76d3b	[Offload] Use flat array for cuLaunchKernel (#95116 ) We already used a flat array of kernel launch parameters for the AMD GPU launch but now we also use this scheme for the NVIDIA GPU launch. The only remaining/required use of the indirection is the host plugin (due ot ffi). This allows to us simplify the use for non-OpenMP kernel launch.	2024-06-13 09:43:47 +03:00
Johannes Doerfert	f2120cda7d	[Offload][AMDGPU] Impose more restrictions for implicit kernel arguments (#95211 ) COV3 is not supported anymore, thus we can just use ArgsSize we read from the kernel to determine how many argument bytes we need and if implicit kernel arguments are used.	2024-06-12 16:42:20 +03:00
Johannes Doerfert	2eb60e2de8	[Offload][NFCI] Initialize the KernelArgsTy to default values (#95117 ) Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-06-11 17:05:04 +03:00
estewart08	89c92b0bcf	[OpenMP][Offload] - Ensure OPENMP_STANDALONE_BUILD is defined (#94801 ) Without a value set conditional checks like if(NOT ${OPENMP_STANDALONE_BUILD}) will not be able to evaluate to true. Fixes issue introduced from PR #93463, which did not allow the OMPT variable to be propogated up to offload during a runtimes build.	2024-06-07 15:37:42 -05:00
Joseph Huber	9e209a4a37	[Offload] Use the kernel argument size directly in AMDGPU offloading (#94667 ) Summary: The old COV3 implementation of HSA used to omit the implicit arguments from the kernel argument size. For COV4 and COV5 this is no longer the case so we can simply use the size reported from the symbol information. See https://github.com/ROCm/ROCR-Runtime/issues/117#issuecomment-812758161	2024-06-06 15:19:55 -05:00
Joseph Huber	2cc1644299	[Offload] Fix missing `abs` function for test Summary: We don't have the abs function to link against, just use the builtin.	2024-06-06 14:37:06 -05:00
Shilei Tian	1ca0055f45	[AMDGPU] Add a new target gfx1152 (#94534 )	2024-06-06 12:16:11 -04:00
Joseph Huber	435aa7663d	[Libomptarget] Rework device initialization and image registration (#93844 ) Summary: Currently, we register images into a linear table according to the logical OpenMP device identifier. We then initialize all of these images as one block. This logic requires that images are compatible with all devices instead of just the one that it can run on. This prevents us from running on systems with heterogeneous devices (i.e. image 1 runs on device 0 image 0 runs on device 1). This patch reworks the logic by instead making the compatibility check a per-device query. We then scan every device to see if it's compatible and do it as they come.	2024-06-06 08:10:56 -05:00
Shilei Tian	4cd115ca34	[OpenMP][OMPX] Fix ompx_ballot_sync test (#94140 ) The current test is not really correct because the mask is set to 0xffffffff even if it is on an AMDGPU whose wavefront size is 64. Besides, `__AMDGCN_WAVEFRONT_SIZE` is not set on host compilation so the verification happens to work.	2024-06-03 11:18:26 -04:00
Shilei Tian	b448efb8ea	Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311 )" (#94139 )	2024-06-03 11:17:36 -04:00
Joseph Huber	e19565c5c4	[Offload][AMDGPU] Only allow memory pool access to valid agents (#93969 ) Summary: The logic since the next-gen plugins was added was that every single agent would get access to a memory pool we allocated. This is necessary for things like fine-grained memory and to faciliate d2d copied. However, there are cases where an agent cannot legally access a memory pool. We have a debug check for this, but it would always be triggered in these situations because both uses of the function simply passed every agent. This patch changes the behavior by only enabling memory pool access for agents that can access the memory pool.	2024-05-31 13:34:40 -05:00
Krzysztof Parzyszek	adc4e45f2e	[Offload] Update test to use `target parallel for reduction` Re-enable test disabled in 1bf1f93d with a fix.	2024-05-30 09:17:17 -05:00
Krzysztof Parzyszek	1bf1f93d94	[Offload] Temporarily disable failing test after eb88e7c1 The `target reduction` combination is no longer accepted. Disable the test to avoid build failures, until a better fix is ready.	2024-05-30 08:52:29 -05:00
Joseph Huber	f284af4863	[Offload][Fix] Fix lazy initialization with multiple images Summary: There was a bug here where we would initialize the plugin multiple times when there were multiple images. Fix it by putting the `is_initliaized` check later.	2024-05-28 10:51:53 -05:00
Shilei Tian	cf9eeb67e5	Revert "Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311 )"" This reverts commit 7b4865582299294455bc816358fd88a9c6e5e0be.	2024-05-26 01:04:39 -04:00
Shilei Tian	7b48655822	Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311 )" This reverts commit 9b31cc71d66064dfaf2afabf4a835211321bb4a0.	2024-05-26 00:57:50 -04:00
Michael Kruse	e14f5f225a	Revise IDE folder structure (#89755 ) Update the folder titles for targets in the monorepository that have not seen taken care of for some time. These are the folders that targets are organized in Visual Studio and XCode (`set_property(TARGET <target> PROPERTY FOLDER "<title>")`) when using the respective CMake's IDE generator. * Ensure that every target is in a folder * Use a folder hierarchy with each LLVM subproject as a top-level folder * Use consistent folder names between subprojects * When using target-creating functions from AddLLVM.cmake, automatically deduce the folder. This reduces the number of `set_property`/`set_target_property`, but are still necessary when `add_custom_target`, `add_executable`, `add_library`, etc. are used. A LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's root CMakeLists.txt.	2024-05-25 17:41:21 +02:00
Joseph Huber	9b31cc71d6	Revert "[OpenMP][OMPX] Add shfl_down_sync (#93311 )" This reverts commit 098c6dfa8157681699a71fce9e3d94515e66311f. This reverts commit 8c718a3a91df4ab68dc3f1ca3887ea730c9aed84. This reverts commit 4fb02de9d490d0773441aa30124bb4d1272230d3.	2024-05-24 19:07:53 -05:00
Shilei Tian	4fb02de9d4	[OpenMP][OMPX] Add shfl_down_sync (#93311 )	2024-05-24 14:00:43 -04:00
Shilei Tian	7eeec8e6d1	[OpenMP][OMPX] Add ballot_sync (#91297 ) This patch adds the support for `ballot_sync` in ompx.	2024-05-24 09:54:54 -04:00
Joseph Huber	21f3a6091f	[Offload] Only initialize a plugin if it is needed (#92765 ) Summary: Initializing the plugins requires initializing the runtime like CUDA or HSA. This has a considerable overhead on most platforms, so we should only actually initialize a plugin if it is needed by any image that is loaded.	2024-05-23 09:36:47 -05:00
Joseph Huber	300e5b9114	[Offload] Fix enabling plugins on unsupported platforms (#93186 ) Summary: Certain plugins can only be built on specific platforms. Previously this didn't cause issues becaues each one was handled independently. However, now that we link these all directly they need to be in a CMake list. Furthermore we use this list to generate a config file. For this reason these checks are moved to where we normalize the support. Fixes: https://github.com/llvm/llvm-project/issues/93183	2024-05-23 08:06:41 -05:00
Jan Patrick Lehr	27f53b266c	[Offload] Disable flaky test on host-offloading (#93174 ) While we investigate the issue, we disable the test on host-offloading so the buildbots are back to more useful state. Issue is tracked: https://github.com/llvm/llvm-project/issues/93173	2024-05-23 13:21:40 +02:00
Anchu Rajendran S	1a2a0c0dc9	Fixing the location attribute added to mapInfoOp (#90764 ) Named location attribute added to `tgt_offload_entry` shall be used by runtime calls like `ompx_dump_mapping_tables` to print the information of variables that are mapped to the device. `ompx_dump_mapping_tables` was printing the wrong location information and this change fixes it. A sample execution of example before the change: ``` omptarget device 0 info: OpenMP Host-Device pointer mappings after block at libomptarget:0:0: omptarget device 0 info: Host Ptr Target Ptr Size (B) DynRefCount HoldRefCount Declaration omptarget device 0 info: 0x0000000000206df0 0x00007f02cdc00000 20000000 1 0 <program-file-loc> at unknown:18:35 ``` The change replaces unknown to the mapped symbol and location to the declaration location.	2024-05-23 13:46:35 +05:30
Joseph Huber	c618ae1734	[Offload] Rework handling for loading vendor runtimes (#93073 ) Summary: We previously had multiple options for this, this patch replaces them with `LIBOMPTARGET_DLOPEN_PLUGINS=` to be a list of plugins to dynamically use. It defaults to everything right now. This ignores the `host` plugin because the `libffi` dependency is going to be removed soon hopefully in https://github.com/llvm/llvm-project/pull/91264.	2024-05-22 13:04:52 -05:00
Joseph Huber	dbfedc6b27	[Offload] Use newer CUDA API functions when dynamically loaded (#93057 ) Summary: CUDA does its versioning by putting a redirection in the header so the API functions remain the same while the symbol changes. These weren't being used for some functions that required it in the dynamic cuda version. These functions have newer verisons that should be used. These are fairly old as far as I'm aware so we should be able to sweep backward compatibility under the rug.	2024-05-22 10:59:56 -05:00
Ye Luo	831d143519	[Offload] libomptarget force dlopen vendor libraries by default. (#92788 ) Since #87009, libomptarget directly links all the plugins statically. All the dependencies of plugins got exposed to libomptarget. The CUDA plugin depends on libcuda and the amdgpu plugin depends on libhsa if not forced using dlopen. On a cluster with different compute node architectures, libomptarget can be built and run on different nodes. In the build stage, if cmake founds libcuda and `LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA=OFF`, libomptarget links libcuda.so directly and the result libomptarget may not run a node without a NVIDIA driver for example a CPU or AMD GPU only machine with a complaint that libcuda.so not found. The solution is setting `LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA` and `LIBOMPTARGET_FORCE_DLOPEN_LIBHSA` `ON`. Preferably this should be default to maximize the usability of libomptarget. If cmake detects NVIDIA or AMD software on an OS imaging building node, the resulted libomptarget may not be able to function on the user side due to the requirement the existence of vendor runtime libraries.	2024-05-22 09:40:43 -05:00
Joseph Huber	3df7cb9ab9	[Offload] Remove unused version script for plugins Summary: The plugins are no longer linked to a share library, making this unused and useless.	2024-05-20 10:06:11 -05:00
Joseph Huber	770d928303	[Offload][NFC] Remove 'libomptarget' message helpers (#92581 ) Summary: This isn't `libomptarget` anymore, and these messages were always unnecessary because no other project uses these prefixed messages. The effect of this is that no longer will the logs have `LIBOMPTARGET --` in front of everything. We have a message stating when we start building the offload project so it'll still be trivial to find.	2024-05-17 13:24:32 -05:00
Joseph Huber	16bb7e89a9	[Offload][NFC] Remove all trailing whitespace from offload/ (#92578 ) Summary: This patch cleans up the training whitespace in a bunch of tests and CMake files. Most just in preparation for other cleanups.	2024-05-17 13:15:04 -05:00
Joseph Huber	c4017cda00	[Offload][NFC] Remove header license in CMake files (#92544 ) Summary: No other project has these in the CMake itself, and they're wildly inconsistent even within the project. These don't really add anything so I think they should be removed.	2024-05-17 09:05:03 -05:00
Joseph Huber	6d2219acab	[Libomptarget] Pass '-Werror=global-constructors' to the libomptarget build (#88531 ) Summary: A runtime library should not have global constructors. Everything is now expected to go through the init methods. This patch ensures that global constructors will not accidentally be introduced.	2024-05-16 16:06:20 -05:00
Joseph Huber	f42f57b52d	[Libomptarget] Rework Record & Replay to be a plugin member (#88928 ) (#89097 ) Summary: Previously, the R&R support was global state initialized by a global constructor. This is bad because it prevents us from adequately constraining the lifetime of the library. Additionally, we want to minimize the amount of global state floating around. This patch moves the R&R support into a plugin member like everything else. This means there will be multiple copies of the R&R implementation floating around, but this was already the case given the fact that we currently handle everything with dynamic libraries.	2024-05-16 14:58:46 -05:00
Joseph Huber	033fa81480	[Offload][NFC] Remove unused files following static plugins Summary: Forgot to remove these when I landed the initial patch, they are no longer used.	2024-05-16 11:48:34 -05:00
Joseph Huber	3abd3d6e59	[Libomptarget] Remove requires information from plugin (#80345 ) Summary: Currently this is only used for the zero-copy handling. However, this can easily be moved into `libomptarget` so that we do not need to bother setting the requires flags in the plugin. The advantage here is that we no longer need to do this for every device redundently. Additionally, these requires flags are specifically OpenMP related, so they should live in `libomptarget`.	2024-05-16 11:13:50 -05:00
Joseph Huber	81d20d861e	[Offload][NFC] Fix warning messages in runtime Summary: These are lots of random warnings due to inconsistent initialization or signedness.	2024-05-15 15:30:38 -05:00
Joseph Huber	332de4b267	[Offload] Correctly reject building on unsupported architectures (#92276 ) Summary: Previously we had this `LIBOMPTARGET_ENABLED` variable which controlled including `libomptarget`. This is now redundant since it's controlled by `LLVM_ENABLE_RUNTIMES`. However, this had the extra effect of not building it when given unsupported targets. THis was lost during the move to `offload`. This patch moves this logic back and makes the `offload` target just quit without doing anything if used on an unsupported architecture. https://github.com/llvm/llvm-project/issues/91881 https://github.com/llvm/llvm-project/issues/91819 --------- Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>	2024-05-15 11:38:41 -05:00
jyu2-git	8e00703be9	[Clang][OpenMP] Fix runtime problem when explicit map both pointer and pointee (#92210 ) ponter int p for following map, test currently crash. map(p, p[:100]) or map(p, p[1]) Currly IR looks like // &p, &p, sizeof(int), TARGET_PARAM \| TO \| FROM // &p, p[0], 100sizeof(float) TO \| FROM Worrking IR is // map(p, p[0:100]) to map(p[0:100]) // &p, &p[0], 100sizeof(float), TARGET_PARAM \| TO \| FROM \| PTR_AND_OBJ The change is add new argument AreBothBasePtrAndPteeMapped in generateInfoForComponentList Use that to skip map for map(p), when processing map(p[:100]) generate map with right flag.	2024-05-15 08:20:25 -07:00

1 2

73 Commits