llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-26 17:06:07 +00:00

Author	SHA1	Message	Date
Yaxun (Sam) Liu	240f2269ff	Add clang atomic control options and attribute (#114841 ) Add option and statement attribute for controlling emitting of target-specific metadata to atomicrmw instructions in IR. The RFC for this attribute and option is https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641, Originally a pragma was proposed, then it was changed to clang attribute. This attribute allows users to specify one, two, or all three options and must be applied to a compound statement. The attribute can also be nested, with inner attributes overriding the options specified by outer attributes or the target's default options. These options will then determine the target-specific metadata added to atomic instructions in the IR. In addition to the attribute, three new compiler options are introduced: `-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`, `-f[no-]atomic-ignore-denormal-mode`. These compiler options allow users to override the default options through the Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to `-f[no-]ignore-denormal-mode`. In terms of implementation, the atomic attribute is represented in the AST by the existing AttributedStmt, with minimal changes to AST and Sema. During code generation in Clang, the CodeGenModule maintains the current atomic options, which are used to emit the relevant metadata for atomic instructions. RAII is used to manage the saving and restoring of atomic options when entering and exiting nested AttributedStmt.	2025-02-27 10:41:04 -05:00
Alex MacLean	6c2e170d04	[NVPTX] Convert vector function nvvm.annotations to attributes (#127736 ) Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxntid[xyz]" -> "nvvm.maxntid" - !"reqntid[xyz]" -> "nvvm.reqntid" - !"cluster_dim_[xyz]" -> "nvvm.cluster_dim"	2025-02-26 08:45:27 -08:00
Anshil Gandhi	95000fdb9e	[CUDA] Increment VTable index for device thunks (#124989 ) Currently, the clang frontend incorrectly emits the callee instead of the thunk for the callee in the VTable. This is the case because the thunk index is not incremented when their callees cannot be emitted. This patch fixes the bug.	2025-02-19 23:47:22 -05:00
Anshil Gandhi	bf9b72e134	[NFC][Clang] Precommit test for VTable codegen (#124983 ) Associated PR: https://github.com/llvm/llvm-project/pull/124989	2025-02-19 21:14:02 -05:00
Fabian Ritter	029c8e783d	[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631	2025-02-19 10:11:48 +01:00
Srinivasa Ravi	bd860f9864	[NVPTX] Add intrinsics for redux.sync f32 instructions (#126664 ) Adds NVVM intrinsics, NVPTX codegen and Clang builtins for `redux.sync` f32 instructions introduced in ptx8.6 for sm_100a. Tests added in `CodeGen/NVPTX/redux-sync.ll` and `CodeGenCUDA/redux-builtins.cu` and verified through ptxas 12.8.0. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-redux-sync	2025-02-14 11:11:44 +05:30
Alex MacLean	a282b6c486	[NVPTX] Convert scalar function nvvm.annotations to attributes (#125908 ) Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank" - !"minctasm" -> "nvvm.minctasm" - !"maxnreg" -> "nvvm.maxnreg"	2025-02-12 07:33:22 -08:00
Joseph Huber	f1e917d07b	[Offload] Unify offloading entries into a single section (#125731 ) Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.	2025-02-06 08:24:01 -06:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Joseph Huber	70a16b90ff	[HIP] Support managed variables using the new driver (#123437 ) Summary: Previously, managed variables didn't work in rdc mode using the new driver because we just didn't register them. This was previously ignored because we didn't have enough space in the current struct format. This patch amends that by just emitting a struct pair for the two variables and using the single pointer. In the future, a more extensible entry format would be nice, but that can be done later.	2025-01-22 09:13:14 -06:00
Alex MacLean	4583f6d344	[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806 ) the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time. This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.	2025-01-07 18:24:50 -08:00
Alex Voicu	66acb26946	[clang][CodeGen][SPIRV] Translate `amdgpu_flat_work_group_size` into `max_work_group_size`. (#116820 ) HIPAMD relies on the `amdgpu_flat_work_group_size` attribute to implement key functionality such as the `__launch_bounds__` `__global__` function annotation. This attribute is not available / directly translatable to SPIR-V, hence as it is AMDGCN flavoured SPIR-V suffers from information loss. This patch addresses that limitation by converting the unsupported attribute into the `max_work_group_size` attribute which maps to [`MaxWorkgroupSizeINTEL`](https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_kernel_attributes.asciidoc), which is available in / handled by SPIR-V. When reverse translating from SPIR-V to AMDGCN LLVMIR we invert the map and add the original AMDGPU attribute.	2025-01-07 12:01:31 +02:00
Joseph Huber	81fae0d5e3	[Clang][AMDGPU] Stop defaulting to `one-as` for all atomic scopes (#120095 ) Summary: The documentation at https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these 'one-as' modifiers are more specific versions of the scopes that only apply to a specific address space. This doesn't make sense for fences which have no associated address space to use, and it's a more restrictive version the normal scope. This should not tbe the default behavior, but it is currently emitted in all cases except for sequentially consistent.	2025-01-06 08:11:08 -06:00
Jun Wang	41ed16c3b3	Reapply "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 )" (#118907 ) This reverts commit 1ef9410a96c1d9669a6feaf03fcab8d0a4a13bd5. This fixes the test file attributor-flatscratchinit-globalisel.ll.	2024-12-09 16:44:48 -08:00
Nikita Popov	462cb3cd6c	[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144 ) If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z/ihztLy	2024-12-05 14:36:40 +01:00
Philip Reames	1ef9410a96	Revert "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 )" This reverts commit e6aec2c12095cc7debd1a8004c8535eef41f4c36. Commit breaks "ninja check-llvm" on x86 host.	2024-12-04 15:37:25 -08:00
Jun Wang	e6aec2c120	[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 ) The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and "amdgpu-stack-objects" attributes, which are used to infer whether we need to initialize flat scratch. This is, however, not precise. Instead, we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on kernels. Refer to https://github.com/llvm/llvm-project/issues/63586 .	2024-12-04 14:10:15 -08:00
John Brawn	ecbe4d1e36	[IR] Allow fast math flags on fptrunc and fpext (#115894 ) This consists of: * Make these instructions part of FPMathOperator. * Adjust bitcode/ir readers/writers to expect fast math flags on these instructions. * Make IRBuilder set the fast math flags on these instructions. * Update langref and release notes. * Update a bunch of tests. Some of these are due to InstCombineCasts incorrectly adding fast math flags to fptrunc, which will be fixed in a later patch.	2024-12-04 10:53:04 +00:00
Shilei Tian	68bcba6d7a	Revert "[AMDGPU] Use COV6 by default (#118515 )" This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.	2024-12-03 20:17:06 -05:00
Shilei Tian	410cbe3cf2	[AMDGPU] Use COV6 by default (#118515 )	2024-12-03 19:38:35 -05:00
Haopeng Liu	4d6e69143d	Add the initializes attribute inference (#117104 ) reland https://github.com/llvm/llvm-project/pull/97373 after fixing clang tests. Confirmed with "ninja check-llvm" and "ninja check-clang"	2024-11-20 19:15:23 -08:00
Artem Belevich	7c3fdcc276	[CUDA] Add support for __grid_constant__ attribute (#114589 ) LLVM support for the attribute has been implemented already, so it just plumbs it through to the CUDA front-end. One notable difference from NVCC is that the attribute can be used regardless of the targeted GPU. On the older GPUs it will just be ignored. The attribute is a performance hint, and does not warrant a hard error if compiler can't benefit from it on a particular GPU variant.	2024-11-05 10:48:54 -08:00
Joseph Huber	42eb54b774	[Clang] Put offloading globals in the `.llvm.rodata.offloading` section (#111890 ) Summary: For our offloading entries, we currently store all the string names of kernels that the runtime will need to load from the target executable. These are available via pointer in the `__tgt_offload_entry` struct, however this makes it difficult to obtain from the object itself. This patch simply puts the strings in a named section so they can be easily queried. The motivation behind this is that when the linker wrapper is doing linking, it wants to know which kernels the host executable is calling. We could get this already via the `.relaomp_offloading_entires` section and trawling through the string table, but that's quite annoying and not portable. The follow-up to this should be to make the linker wrapper get a list of all used symbols the device link job should count as "needed" so we can handle static linking more directly.	2024-10-28 07:17:50 -07:00
Alex Voicu	2074de252b	[clang][HIP] Don't use the OpenCLKernel CC when targeting AMDGCNSPIRV (#110447 ) When compiling HIP source for AMDGCN flavoured SPIR-V that is expected to be consumed by the ROCm HIP RT, it's not desirable to set the OpenCL Kernel CC on `__global__` functions. On one hand, this is not an OpenCL RT, so it doesn't compose with e.g. OCL specific attributes. On the other it is a "noisy" CC that carries semantics, and breaks overload resolution when using [generic dispatchers such as those used by RAJA](`186d4194a5/src/common/HipDataUtils.hpp (L39)`).	2024-10-22 17:16:46 +01:00
Youngsuk Kim	0f0a96b862	[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017 ) Remove the extraneous '+0' immediate offset part in PTX load/stores, to improve readability of output PTX code.	2024-10-19 10:05:36 -04:00
Matt Arsenault	51b4ada458	clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (#102462 )	2024-10-17 17:10:45 +04:00
Matt Arsenault	d50302f31c	clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (#111579 )	2024-10-09 08:52:32 +04:00
Alex Voicu	e203a67f4c	[cuda][HIP] `__constant__` should imply constant (#110182 ) Currently, `__constant__` variables do not get unconditionally marked as `constant` in IR, which seems a bit odd given their definition. This is generally inconsequential for NVPTX/AMDGPU, since said variables get emitted in the constant address space for those BEs. However, it is potentially significant for e.g. HIP-on-SPIR-V cases, as SPIR-V does not allow casts to/from the constant AS (`UniformConstant`), which forces `__constant__` variables to be emitted in the global AS, thus making IR constness meaningful.	2024-09-29 01:22:52 +01:00
jofrn	b5fd9463a3	[HIP][Clang][CodeGen] Handle hip bin symbols properly. (#107458 ) Remove '_' in fatbin and gpubin symbol suffixes when missing TU hash ID. Internalize gpubin symbol so that it is not unresolved at link-time when symbol is not relocatable.	2024-09-11 18:46:46 -04:00
Alex Voicu	ad435bcc14	[clang][CodeGen][SPIR-V][AMDGPU] Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates passed to kernels / functions. (#102776 ) The AMDGPU kernel ABI is not directly representable in SPIR-V, since it relies on passing aggregates `byref`, and SPIR-V only encodes `byval` (which the AMDGPU BE disallows for kernel arguments). As a temporary solution to this mismatch, we add special handling for AMDGCN flavoured SPIR-V, whereby aggregates are passed as direct, both to kernels and to normal functions. This is not ideal (there are pathological cases where performance is heavily impacted), but empirically robust and guaranteed to work as the AMDGPU BE retains handling of `direct` passing for legacy reasons. We will revisit this in the future, but as it stands it is enough to pass a wide array of integration tests and generates correct SPIR-V and correct reverse translation into LLVM IR. The amdgpu-kernel-arg-pointer-type test is updated via the automated script, and thus becomes quite noisy.	2024-08-21 13:16:59 +01:00
Johannes Doerfert	80525dfcde	[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549 ) Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm` functions to orchestrate the kernel launch rather than `cuda` functions. These `llvm` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void DevicePtr, int DeviceNum); } __global__ void square(int A) { A = 42; } int main(int argc, char argv) { int DevNo = 0; int Ptr = reinterpret_cast<int >(llvm_omp_target_alloc_shared(4, DevNo)); Ptr = 7; printf("Ptr %p, Ptr %i\n", Ptr, Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, Ptr %i\n", Ptr, Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, Ptr 7 Ptr 0x155448ac8000, Ptr 42 ```	2024-08-12 17:44:58 -07:00
Artem Belevich	5629249575	[CUDA] Emit used function list in deterministic order. (#102661 ) Fixes https://github.com/llvm/llvm-project/issues/101560	2024-08-12 10:21:23 -07:00
Hari Limaye	94473f4db6	[IRBuilder] Generate nuw GEPs for struct member accesses (#99538 ) Generate nuw GEPs for struct member accesses, as inbounds + non-negative implies nuw. Regression tests are updated using update scripts where possible, and by find + replace where not.	2024-08-09 13:25:04 +01:00
Eli Friedman	1762e01cca	Fix codegen of consteval functions returning an empty class, and related issues (#93115 ) Fix codegen of consteval functions returning an empty class, and related issues If a class is empty, don't store it to memory: the store might overwrite useful data. Similarly, if a class has tail padding that might overlap other fields, don't store the tail padding to memory. The problem here turned out a bit more general than I initially thought: basically all uses of EmitAggregateStore were broken. Call lowering had a method that did mostly the right thing, though: CreateCoercedStore. Adapt CreateCoercedStore so it always does the conservatively right thing, and use it for both calls and ConstantExpr. Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was set incorrectly for empty classes in some cases. Fixes #93040.	2024-08-01 16:18:20 -07:00
Matt Arsenault	41439d5bb7	AMDGPU: Handle remote/fine-grained memory in atomicrmw fmin/fmax lowering (#96759 ) Consider the new atomic metadata when choosing to expand as cmpxchg instead.	2024-08-01 22:08:01 +04:00
darkbuck	fa84297002	[clang][CUDA] Add 'noconvergent' function and statement attribute - For languages following SPMD/SIMT programming model, functions and call sites are marked 'convergent' by default. 'noconvergent' is added in this patch to allow developers to remove that 'convergent' attribute when it's safe. Reviewers: nhaehnle, Sirraide, yxsamliu, Artem-B, ilovepi, jayfoad, ssahasra, arsenm Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/100637	2024-07-31 11:30:48 -04:00
Matt Arsenault	e108853ac8	clang: Allow targets to set custom metadata on atomics (#96906 ) Use this to replace the emission of the amdgpu-unsafe-fp-atomics attribute in favor of per-instruction metadata. In the future new fine grained controls should be introduced that also cover the integer cases. Add a wrapper around CreateAtomicRMW that appends the metadata, and update a few use contexts to use it.	2024-07-26 09:57:28 +04:00
Yaxun (Sam) Liu	77fd30f7ce	[CUDA][HIP] Fix template static member (#98580 ) Should check host/device attributes before emitting static member of template instantiation. Fixes: https://github.com/llvm/llvm-project/issues/98151	2024-07-12 10:08:34 -04:00
Matt Arsenault	8f63d154ec	clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (#96738 )	2024-06-27 15:32:08 +02:00
Matt Arsenault	a440a96ec2	AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (#95592 ) Define subtarget features for atomic fmin/fmax support. The flat/global support is a real messe. We had float/double support at the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them. gfx11 removed the f64 versions again. gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.	2024-06-23 10:10:41 +02:00
Matt Arsenault	76894c5e6e	clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (#95395 ) We should have done this for the f32/f64 case a long time ago. Now that codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting it instead. This also does upgrade the behavior to respect a volatile qualified pointer, which was previously ignored (for the cases that don't have an explicit volatile argument).	2024-06-18 20:51:14 +02:00
Stephen Tozer	094572701d	[RemoveDIs] Print IR with debug records by default (#91724 ) This patch makes the final major change of the RemoveDIs project, changing the default IR output from debug intrinsics to debug records. This is expected to break a large number of tests: every single one that tests for uses or declarations of debug intrinsics and does not explicitly disable writing records. If this patch has broken your downstream tests (or upstream tests on a configuration I wasn't able to run): 1. If you need to immediately unblock a build, pass `--write-experimental-debuginfo=false` to LLVM's option processing for all failing tests (remember to use `-mllvm` for clang/flang to forward arguments to LLVM). 2. For most test failures, the changes are trivial and mechanical, enough that they can be done by script; see the migration guide for a guide on how to do this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates 3. If any tests fail for reasons other than FileCheck check lines that need updating, such as assertion failures, that is most likely a real bug with this patch and should be reported as such. For more information, see the recent PSA: https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578	2024-06-14 15:07:27 +01:00
Nikita Popov	cc2dc0916a	Reapply [ConstantFold] Drop gep of gep fold entirely (#95126 ) Reapplying without changes. The flang+openmp buildbot failure should be addressed by https://github.com/llvm/llvm-project/pull/94541. ----- This is a followup to https://github.com/llvm/llvm-project/pull/93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP. We didn't have any test coverage for this fold in LLVM, but some Clang tests change.	2024-06-13 17:03:35 +02:00
Nikita Popov	cece0a105b	Revert "[ConstantFold] Drop gep of gep fold entirely (#95126 )" This reverts commit 3b3b839c66dc49674fd6646650525a2173030690. This broke the flang+openmp+offload buildbot, as reported in https://github.com/llvm/llvm-project/pull/95126#issuecomment-2162424019.	2024-06-12 11:52:12 +02:00
Nikita Popov	3b3b839c66	[ConstantFold] Drop gep of gep fold entirely (#95126 ) This is a followup to https://github.com/llvm/llvm-project/pull/93823 and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are now left to the DataLayout-aware constant folder, which will fold everything to a single i8 GEP. We didn't have any test coverage for this fold in LLVM, but some Clang tests change.	2024-06-12 09:50:14 +02:00
Alex Voicu	88e2bb4092	[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796 ) This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features: - AMDGCN inline ASM is allowed/supported, under the assumption that the [SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc) extension is enabled/used - AMDGCN target specific builtins are allowed/supported, under the assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is enabled when using the downstream translator - the featureset matches the union of AMDGCN targets' features - the datalayout string is overspecified to affix both the program address space and the alloca address space, the latter under the assumption that the [SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc) extension is enabled/used, case in which the extant SPIRV datalayout string would lead to pointers to function pointing to the private address space, which would be wrong. Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).	2024-06-07 11:50:23 +01:00
Alex MacLean	e8500a7054	fixup cuda-builtin-vars.cu broken in IntrRange change (#94639 )	2024-06-06 10:10:00 -07:00
Alex MacLean	435addbf50	[NVPTX] Revamp NVVMIntrRange pass (#94422 ) Revamp the NVVMIntrRange pass making the following updates: - Use range attributes over range metadata. This is what instcombine has move to for ranges on intrinsics in https://github.com/llvm/llvm-project/pull/88776 and it seems a bit cleaner. - Consider the `!"maxntid{x,y,z}"` and `!"reqntid{x,y,z}"` function metadata when adding ranges for `tid` srge instrinsics. This can allow for smaller ranges and more optimization. - When range attributes are already present, use the intersection of the old and new range. This complements the metadata change by allowing ranges to be shrunk when an intrinsic is in a function which is inlined into a kernel with metadata. While we don't call this more then once yet, we should consider adding a second call after inlining, once this has had a chance to soak for a while and no issues have arisen. I've also re-enabled this pass in the TM, it was disabled years ago due to "numerical discrepancies" https://reviews.llvm.org/D96166. In our testing we haven't seen any issues with adding ranges to intrinsics, and I cannot find any further info about what issues were encountered.	2024-06-06 06:42:46 -07:00
Yaxun (Sam) Liu	be5075ab8d	[CUDA] make kernel stub ICF-proof (#90155 ) MSVC linker merges functions having comdat which have identical set of instructions. CUDA uses kernel stub function as key to look up kernels in device executables. If kernel stub function for different kernels are merged by ICF, incorrect kernels will be launched. To prevent ICF from merging kernel stub functions, an unique global variable is created for each kernel stub function having comdat and a store is added to the kernel stub function. This makes the set of instructions in each kernel function unique. Fixes: https://github.com/llvm/llvm-project/issues/88883	2024-05-01 10:24:23 -04:00

1 2 3 4 5 ...

433 Commits