llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-19 03:16:41 +00:00

Author	SHA1	Message	Date
Jakub Kuderski	f359c0bde5	[mlir][arith] Trim trailing spaces in wide int emulation tests. NFC. (#133349 ) Followup cleanup after https://github.com/llvm/llvm-project/pull/132375 and https://github.com/llvm/llvm-project/pull/133248	2025-03-27 22:36:05 -04:00
Longsheng Mou	a6cb5cc0f0	[mlir] Add nullptr checks in SparseElementsAttr parser (#133222 ) This PR adds nullptr checks in the SparseElementsAttr parser to improve robustness and prevent crashes. Fixes #132891.	2025-03-28 10:11:14 +08:00
egebeysel	3a3732c252	[mlir][arith] wide integer emulation support for fptoi ops (#132375 ) Adding wide integer emulation support for `arith.fptoi` operations. As the other emulated operations, the upper and lower `N` bits of the `i2N` integer result are emitted separately. For the unsigned case we use the following emulation ```c // example is 64 -> 32 bit emulation, but the implementation is generalized to any 2N -> N case const double TWO_POW_N = (uint_64_t(1) << N); // 2^N, N is the bitwidth of the widest int supported // f is a floating-point value representing the input of the fptoui op. uint32_t hi = (uint32_t)(f / TWO_POW_N); // Truncates the division result uint32_t lo = (uint32_t)(f - hi * TWO_POW_N); // Subtracts to get the lower bits. ``` For the signed case, we defer the emulation of the absolute value to `fptoui` and handle the sign: ``` fptosi(fp) = sign(fp) * fptoui(abs(fp)) ``` The edge cases of `NaNs, +-inf` and overflows/underflows are undefined behaviour and the resulting numbers are the combination of the lower bitwidth UB values. These operations also propagate poison values. Signed-off-by: Ege Beysel <beysel@roofline.ai>	2025-03-27 20:58:56 -04:00
Krzysztof Drewniak	d7c53a91c2	[mlir] Decouple enum generation from attributes, adding EnumInfo and EnumCase (#132148 ) This commit pulls apart the inherent attribute dependence of classes like EnumAttrInfo and EnumAttrCase, factoring them out into simpler EnumCase and EnumInfo variants. This allows specifying the cases of an enum without needing to make the cases, or the EnumInfo itself, a subclass of SignlessIntegerAttrBase. The existing classes are retained as subclasses of the new ones, both for backwards compatibility and to allow attribute-specific information. In addition, the new BitEnum class changes its default printer/parser behavior: cases when multiple keywords appear, like having both nuw and nsw in overflow flags, will no longer be quoted by the operator<<, and the FieldParser instance will now expect multiple keywords. All instances of BitEnumAttr retain the old behavior.	2025-03-27 19:40:06 -05:00
AdityaK	d18faf6460	[MLIR] NFC: Remove unused includes (#133327 )	2025-03-27 16:50:47 -07:00
Michael Liao	52975d5c9f	[mlir][scf] Allow different forwarding ordering in uplift - Allow 'before' arguments are forwarded in different order to 'after' body when uplifting `scf.while` to `scf.for`.	2025-03-27 18:09:07 -04:00
AdityaK	f2849fe05f	Fix RemoveDeadValues: Bail out early when there are no terminators (#133316 ) Fixes: #131765	2025-03-27 15:04:31 -07:00
Alexander Weinrauch	1d9d4651df	[MLIR][ROCDL] Enable AliasAnalysis for GlobalLoadLds and LDS_Read_Tr (#133255 ) Enables AliasAnalysis for `GlobalLoadLds` and `LDS_Read_Tr`. All other memory related ROCDL Ops have this already enabled.	2025-03-27 15:02:40 -07:00
Bruno Cardoso Lopes	7c3ecffe9b	[MLIR][LLVMIR] Add support for the full form of global_{ctor,dtor} (#133176 ) Currently only ctor/dtor list and their priorities are supported. This PR adds support for the missing data field. Few implementation notes: - The assembly printer has a fixed form because previous `attr_dict` will sort the dict by key name, making global_dtor and global_ctor differ in the order of printed arguments. - LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd have to create a region to use the default operation conversion from `ptr null`, which is silly given that the field only support null or a symbol.	2025-03-27 14:11:05 -07:00
egebeysel	a1a5594ad2	[mlir][arith] add wide integer emulation support for subi (#133248 ) Adds wide integer emulation support for the `arith.subi` op. `(i2N, i2N) -> (i2N)` ops are emulated as `(vector<2xiN>, vector<2xiN>) -> (vector<2xiN>)`, just as the other emulation patterns. The emulation uses the following scheme: ``` resLow = lhsLow - rhsLow; // carry = 1 if rhsLow > lhsLow resHigh = lhsLow - carry - rhsLow; ``` Signed-off-by: Ege Beysel <beysel@roofline.ai>	2025-03-27 15:01:04 -05:00
Fabian Mora	1a7af2a90f	[mlir][DataLayout] Add `IsolatedFromAbove` to `DataLayoutOpInterface` (#132742 ) This patch adds the `IsolatedFromAbove` trait as a dependent trait to the `DataLayoutOpInterface` op interface. The motivation behind this change comes from the implementation of the `ptr` dialect, specifically the `ptr.type_offset` op. This op produces an int-like value that equates to the size of a memory element. This is useful for ptr arithmetic and indexing arrays. For example: ```mlir %f32_off = ptr.type_offset f32 : index %addr = ptr.ptradd %ptr, %f32_off : !ptr, index %x = ptr.load %addr : !ptr -> f32 // Read ptr[1] ``` Without the `IsolatedFromAvobe` trait in the DL interface, the `ptr.type_offset` cannot be `ConstantLike`. Why? Take the example: ```mlir op {DL1} { %f32_off0 = ptr.type_offset f32 : index op {DL2} { %f32_off1 = ptr.type_offset f32 : index } } ``` If `ptr.type_offset` were to be `ConstantLike` then `canonicalize` would hoist and unique the value. However, that could be wrong as DL2 could have an entry to specify the size that's different from the size in DL1. The best solution to the above problem is to make `DataLayoutOpInterface` require the `IsolatedFromAbove` trait, as it preserves the constness of values in the DL with respect to the canonicalizer.	2025-03-27 14:37:37 -04:00
Bruno Cardoso Lopes	08aedf7201	[MLIR][LLVM] Lift alignstack attribute ptr type restriction (#133195 ) Current usage of alignstack is restricted to LLVM pointer types, whereas when it's used in parameters it's possible to use it for other types, see examples like `{i8, i8}, [2 x float], etc` in `llvm/test/CodeGen`. This PR lifts the restriction and add testcases.	2025-03-27 10:28:37 -07:00
Guray Ozen	38d9a44510	[MLIR][NVGPU] Add `tma.fence.descriptor` OP (#133218 ) When the TMA descriptor is transferred from host memory to global memory using cudaMemcpy, each thread block must insert a fence before any thread accesses the updated tensor map in global memory. Once the tensor map has been accessed, no additional fences are needed by that block unless the map is modified again. [Example from cuda programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#using-tma-to-transfer-multi-dimensional-arrays). The `tma.fence.descriptor` basically implements `ptx::fence_proxy_tensormap_generic`. ``` #include <cuda.h> #include <cuda/ptx> namespace ptx = cuda::ptx; __device__ CUtensorMap global_tensor_map; __global__ void kernel(CUtensorMap *tensor_map) { // Fence acquire tensor map: ptx::n32_t<128> size_bytes; // Since the tensor map was modified from the host using cudaMemcpy, // the scope should be .sys. ptx::fence_proxy_tensormap_generic( ptx::sem_acquire, ptx::scope_sys, tensor_map, size_bytes ); // Safe to use tensor_map after fence inside this thread.. } int main() { CUtensorMap local_tensor_map; // [ ..Initialize map.. ] cudaMemcpy(&global_tensor_map, &local_tensor_map, sizeof(CUtensorMap), cudaMemcpyHostToDevice); kernel<<<1, 1>>>(global_tensor_map); } ```	2025-03-27 15:20:19 +01:00
Guray Ozen	bc7e3915e1	[MLIR][NVGPU] Add `mbarrier.get` Op (#133221 ) The `mbarrier.create` op can create multiple mbarrier objects, and other mbarrier-related ops can access an mbarrier using a dynamic SSA value. This is especially useful when using mbarriers in dynamic loops. This PR adds the `mbarrier.get` op, which returns a pointer to a specific mbarrier object from a group of barriers created by the nvgpu.mbarrier.create operation. It is useful when composing the NVGPU and NVVM dialects. Example: ``` %mbars = nvgpu.mbarrier.create -> !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>, num_barriers = 10> %mbar_pointer = nvgpu.mbarrier.get %mbars[%c2] : !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>> -> i32 ```	2025-03-27 15:20:07 +01:00
Longsheng Mou	ac09b789d8	[mlir][scf] Remove redundant ensureTerminator for `scf.forall` (#133081 ) The override function `ensureTerminator` ensures that the terminator `InParallelOp` has a region. However, if the terminator of `scf.forall` is not an `InParallelOp`, calling ensureTerminator causes a crash. Since the InParallelOp builder already guarantees the existence of a region, `ForallOp::ensureTerminator` is redundant and can be safely removed. Fixes #130019.	2025-03-27 20:07:20 +08:00
Letu Ren	9438694a54	[mlir][llvmir] Add llvm.intr.ldexp operation (#133070 ) https://llvm.org/docs/LangRef.html#llvm-ldexp-intrinsic	2025-03-27 07:50:39 +01:00
Letu Ren	ad51368881	[mlir][llvmir] add llvm.experimental.constrained.sitofp intrinsics (#133166 ) https://llvm.org/docs/LangRef.html#llvm-experimental-constrained-sitofp-intrinsic Signed-off-by: Letu Ren <fantasquex@gmail.com>	2025-03-27 07:50:04 +01:00
Krzysztof Drewniak	263ec7221e	[mlir][NFC] Move and rename EnumAttrCase, EnumAttr C++ classes (#132650 ) This moves the EnumAttrCase and EnumAttr classes from Attribute.h/.cpp to a new EnumInfo.h/cpp and renames them to EnumCase and EnumInfo, respectively. This doesn't change any of the tablegen files or any user-facing aspects of the enum attribute generation system, just reorganizes code in order to make main PR (#132148) shorter.	2025-03-26 20:26:14 -05:00
Maksim Levental	bfe85230e2	[mlir][IntegerRangeAnalysis] expose maybeReplaceWithConstant (#133151 ) This PR exposes `maybeReplaceWithConstant` in headers for downstream use.	2025-03-26 18:10:12 -04:00
Frank Schlimbach	9269aaecff	[mlir][mesh] fixes for 0d tensors (#132948 ) In some cases 0d tensors have no sharding. This PR provides a few minor fixes to account for such cases.	2025-03-26 18:13:41 +01:00
Guray Ozen	e8dfd70fe2	[MLIR][NVGPU] Use `gpu.dynamic_shared_memory` in tests (#133122 ) Reland #133051	2025-03-26 18:00:22 +01:00
Michael Liao	52f941adbc	[mlir][tosa] Fix '-Wreturn-type'. NFC	2025-03-26 11:43:24 -04:00
Karlo Basioli	3f82c3d5a8	Revert "[MLIR][NVGPU] Use `gpu.dynamic_shared_memory` in tests" (#133103 ) Reverts llvm/llvm-project#133051 due to failing integration tests	2025-03-26 15:39:14 +00:00
Kazu Hirata	1cc07a0865	[mlir] Use Set::insert_range (NFC) (#133043 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E); down to: Set.insert_range(Range); In some cases, we can further fold that into the set declaration.	2025-03-26 07:47:02 -07:00
Guray Ozen	15f5a7a3ec	[MLIR][NVGPU] Use `gpu.dynamic_shared_memory` in tests (#133051 ) The `memref.subview` ops in the test case were incorrect: they extracted out-of-bounds.	2025-03-26 14:32:04 +01:00
Longsheng Mou	894b27a746	[mlir][MemRefToLLVM] Fix crash with unconvertable memory space (#132323 ) This PR adds handling when the `memref.alloca` with unconvertable memory space to prevent a crash. Fixes #131439.	2025-03-26 16:51:26 +08:00
Longsheng Mou	73f487d31e	[mlir][TosaToLinalg] Fix bugs in PointwiseConverter (#132526 )	2025-03-26 08:33:47 +00:00
Srinivasa Ravi	40815be30a	[MLIR][NVVM] Add support for st.bulk Op (#131727 ) This change adds the `st.bulk` NVVM Op for the `st.bulk` instruction introduced in ptx8.6 for sm_100. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk	2025-03-26 08:30:58 +05:30
Emilio Cota	2da4ce8624	Revert "[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled" (#133000 ) Reverts llvm/llvm-project#128566. See as well the discussion in llvm/llvm-project#132935.	2025-03-25 22:40:06 +00:00
Bruno Cardoso Lopes	e7e242e7ad	[MLIR][LLVM] Fix debug value/declare import in face of landing pads (#132871 ) Debug value/declare operations imported before landing pad operations at the bb start break invoke op verification: ``` error: first operation in unwind destination should be a llvm.landingpad operation ``` This this issue by making the placement slightly more smart.	2025-03-25 13:15:51 -07:00
Bruno Cardoso Lopes	74c2c049d1	[MLIR][LLVM] Add weak_odr to allowed linkage for alias (#132840 ) I missed this when originally introduced the feature (note the verifier message already contains it), this fixes a small bug.	2025-03-25 10:46:02 -07:00
Igor Wodiany	7b3885d47b	[mlir][spirv] Add definition for GL Fract (#132921 )	2025-03-25 16:59:09 +00:00
Ian Tayler Lessa	5f58f3dda8	[mlir][tosa] Avoid overflow in reduction folders (#132786 ) Avoid operations that can overflow in constant folders for `tosa.reduce_max` and `tosa.reduce_min` Includes tests to avoid regressions Signed-off-by: Ian Tayler Lessa <ian.taylerlessa@arm.com>	2025-03-25 16:43:46 +00:00
Karlo Basioli	f3a14217a9	Fix maybe unused errors caused by #131527 (#132944 )	2025-03-25 15:26:21 +00:00
Jerry-Ge	4fe2ad498e	[mlir][tosa] Align dialect summary and description sections to the TOSA v1.0 spec (#132835 ) Align dialect summary and description sections to the TOSA v1.0 spec * Updated the summary and description sections in the dialect to better align with the TOSA v1.0 specification. * Fixed the output variable names LogicalAndOp, LogicalRightShiftOp, LogicalXorOp * Removed some redundant comments Signed-off-by: Jerry Ge <jerry.ge@arm.com>	2025-03-25 08:20:43 -07:00
Karlo Basioli	f6823a0ae1	Revert "[mlir][memref] Verify out-of-bounds access for `memref.subview`" (#132940 ) Reverts llvm/llvm-project#131876 GPU integration tests get broken by this PR. E.x. `mlir/test/Integration/GPU/CUDA/sm90/gemm_f32_f16_f16_128x128x128.mlir`	2025-03-25 14:56:08 +00:00
Kazu Hirata	fedac3bdb8	[mlir] Fix warnings This patch fixes: mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp:331:8: error: unused variable 'srcVecTy' [-Werror,-Wunused-variable] mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp:332:8: error: unused variable 'destVecTy' [-Werror,-Wunused-variable]	2025-03-25 07:39:15 -07:00
Andrzej Warzyński	9768077de6	[mlir][vector] Update helpers in `VectorEmulateNarrowType.cpp` (nfc) (#131527 ) Refactors the following pairs of helper hooks: * `dynamicallyInsertSubVector` + `staticallyInsertSubVector` * `dynamicallyExtractSubVector` + `staticallyExtractSubVector` These hooks are very similar, so I have unified the variable names and various conditions to make the actual differences clearer.	2025-03-25 13:32:32 +00:00
Guray Ozen	9910d34d6c	[MLIR][NVVM] Print ptxas path in debug output for "serialize-to-binary" (#132373 )	2025-03-25 12:21:06 +01:00
Ivan Butygin	9b022220b7	[mlir][vector] Propagate `vector.extract` through elementwise ops (#131462 ) Propagate `Extract(Elementwise(...))` -> `Elemetwise(Extract...)`. Currenly limited to the case when extract is the single use of elementwise to avoid introducing additional elementwise ops.	2025-03-25 14:07:48 +03:00
Karlo Basioli	36b36060a1	[mlir][spirv] Fix cyclical dependency in bazel (#132785 )	2025-03-25 10:39:04 +00:00
Matthias Springer	d4304d85f2	[mlir][memref] Verify out-of-bounds access for `memref.subview` (#131876 ) * Improve the verifier of `memref.subview` to detect out-of-bounds extractions. * Improve the documentation of `memref.subview` to make clear that out-of-bounds extractions are not allowed. Rewrite examples to use the new `strided<>` notation instead of `affine_map` layout maps. Also remove all unrelated operations (`memref.alloc`) from the examples. * Fix various test cases where `memref.subview` ops ran out-of-bounds. * Update canonicalizations patterns to ensure that they do not fold IR if it would generate IR that no longer verifies. Related discussion on Discourse: https://discourse.llvm.org/t/out-of-bounds-semantics-of-memref-subview/85293	2025-03-25 11:25:11 +01:00
Luke Hutton	d4570ea813	[mlir][tosa] Disallow invalid datatype combinations in the validation pass (#131595 ) This commit checks if the operands/results of an operator can be found in the profile compliance mapping, if it isn't the operator is considered invalid. As a result, operator datatype combinations that are not listed under the "Supported Data Types" of the TOSA specification are disallowed and the validation pass results in failure. Signed-off-by: Luke Hutton <luke.hutton@arm.com>	2025-03-25 10:05:39 +00:00
Georgios Pinitas	3df92197bb	[mlir][tosa] Support `DenseResourceElementsAttr` in TOSA transpose folders (#124532 ) Handle dense resource attributes in the transpose TOSA folder. Currently their interface does not align with the rest of the `ElementsAttr` when it comes to data accessing hence the special handling. Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>	2025-03-24 21:48:22 +00:00
Jerry-Ge	1b9d475453	[mlir][tosa] Align validation profiles and extensions to TOSA v1.0 spec (#132768 ) - Add missing int16 extension for concat operator - Remove int16 extension for cast operator - Add pro_int and pro_fp profiles for const_shape operator Signed-off-by: Jerry Ge <jerry.ge@arm.com>	2025-03-24 13:01:15 -07:00
MaheshRavishankar	e4172196a7	[mlir][TilingInterface] Make `tileAndFuseConsumerOfSlice` take surrounding loops as an argument. (#132082 ) This gets the consumer fusion method in sync with the corresponding producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as input required use of complicated analysis to retrieve the surrounding loops which are very fragile. Just like the producer fusion method, the loops need to be taken in as an argument, with typically the loops being created by the tiling methods. Some utilities are added to check that the loops passed in are perfectly nested (in the case of an `scf.for` loop nest. This is change 1 of N to simplify the implementation of tile and fuse consumers. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-03-24 11:41:26 -07:00
Bruno Cardoso Lopes	8a2a694438	[MLIR][LLVM] Support dso_local_equivalent constants (#132131 ) Create a new operation `DSOLocalEquivalentOp`, following the steps of other constants. This is similar in a way to `AddressOfOp` but with specific semantics: only support functions and function aliases (no globals) and extern_weak linkage is not allowed. An alternative approach is to use a new `UnitAttr` in `AddressOfOp` and check that attribute to enforce specific semantics in the verifiers. The drawback is going against what other constants do and having to add more attributes in the future when we introduce `no_cfi`, `blockaddress`, etc. While here, improve the error message for other missing constants.	2025-03-24 10:43:53 -07:00
Kareem Ergawy	5c02f1a5af	[OpenMP][IRBuilder] De-duplicate code that emit task dependencies (#132340 ) A small clean-up following up on #131795. Seems like we had 2 quite similar implementations for the same thing: emit task dependencies struct and filling it. This PR unifies the 2 versions into one. This is better since we had to fix a bug in one of them in #131795 so this applies the fix for both.	2025-03-24 16:04:20 +01:00
Igor Wodiany	3aa20c266c	[mlir][spirv] Add definition for selected sample operations (#129558 ) This commit adds following three operations: ImageSampleImplicitLodOp, ImageSampleExplicitLodOp and ImageSampleProjDrefImplicitLodOp	2025-03-24 13:56:36 +00:00
Matthias Springer	529ee3cf3b	[mlir][tensor] Fix slice canonicalizer for out-of-bounds cases (#132534 ) Since #130487, `tensor.extract_slice` and `tensor.insert_slice` ops that are statically detected to go out of bounds are rejected by the verifier. This commit fixes canonicalization patterns that currently fold dynamically out-of-bounds ops (valid IR) to statically out-of-bounds ops (invalid IR).	2025-03-24 14:39:37 +01:00

1 2 3 4 5 ...

22387 Commits