12819 Commits

Author SHA1 Message Date
Matthias Springer
4abff4d7b2
[mlir][Transforms] Improve replaceOpWithMultiple API (#132608)
This commit adds an additional overload to `replaceOpWithMultiple` that
accepts additional container types. This has been brought up by users of
the new `replaceOpWithMultiple` API.

In particular, one missing container type was
`SmallVector<SmallVector<Value>>`. The "default" `ArrayRef<ValueRange>`
container type can lead to use-after-scope errors in cases such as:
```c++
// Compute the replacement value ranges. Some replacements are single
// values, some are value ranges.
SmallVector<ValueRange> repl;
repl.push_back(someValueRange);  // OK
for (...) {
  // push_back(Value) triggers an implicit conversion to ValueRange,
  // which does not own the range.
  repl.push_back(someValue);  // triggers use-after-scope later
}
rewriter.replaceOpWithMultiple(op, repl);
```

In this example, users should use `SmallVector<SmallVector<Value>>
repl;`.
2025-03-28 14:18:54 +01:00
Letu Ren
29cb00331f
[mlir][llvmir] add llvm.experimental.constrained.uitofp intrinsics (#133300)
https://llvm.org/docs/LangRef.html#llvm-experimental-constrained-uitofp-intrinsic

Signed-off-by: Letu Ren <fantasquex@gmail.com>
2025-03-28 08:47:36 +01:00
Letu Ren
68f71aae3b
[mlir][llvmir] add llvm.sincos intrinsics (#133311)
https://llvm.org/docs/LangRef.html#llvm-frexp-intrinsic

Signed-off-by: Letu Ren <fantasquex@gmail.com>
2025-03-28 08:45:46 +01:00
Jakub Kuderski
f359c0bde5
[mlir][arith] Trim trailing spaces in wide int emulation tests. NFC. (#133349)
Followup cleanup after https://github.com/llvm/llvm-project/pull/132375
and https://github.com/llvm/llvm-project/pull/133248
2025-03-27 22:36:05 -04:00
Longsheng Mou
a6cb5cc0f0
[mlir] Add nullptr checks in SparseElementsAttr parser (#133222)
This PR adds nullptr checks in the SparseElementsAttr parser to improve
robustness and prevent crashes. Fixes #132891.
2025-03-28 10:11:14 +08:00
egebeysel
3a3732c252
[mlir][arith] wide integer emulation support for fpto*i ops (#132375)
Adding wide integer emulation support for `arith.fpto*i` operations. As
the other emulated operations, the upper and lower `N` bits of the `i2N`
integer result are emitted separately.

For the unsigned case we use the following emulation

```c
// example is 64 -> 32 bit emulation, but the implementation is generalized to any 2N -> N case
const double TWO_POW_N = (uint_64_t(1) << N); // 2^N, N is the bitwidth of the widest int supported

// f is a floating-point value representing the input of the fptoui op.
uint32_t hi = (uint32_t)(f / TWO_POW_N);         // Truncates the division result
uint32_t lo = (uint32_t)(f - hi * TWO_POW_N);       // Subtracts to get the lower bits.
```

For the signed case, we defer the emulation of the absolute value to
`fptoui` and handle the sign:

```
fptosi(fp) = sign(fp) * fptoui(abs(fp))
```

The edge cases of `NaNs, +-inf` and overflows/underflows are undefined
behaviour and the resulting numbers are the combination of the lower
bitwidth UB values. These operations also propagate poison values.

Signed-off-by: Ege Beysel <beysel@roofline.ai>
2025-03-27 20:58:56 -04:00
Krzysztof Drewniak
d7c53a91c2
[mlir] Decouple enum generation from attributes, adding EnumInfo and EnumCase (#132148)
This commit pulls apart the inherent attribute dependence of classes
like EnumAttrInfo and EnumAttrCase, factoring them out into simpler
EnumCase and EnumInfo variants. This allows specifying the cases of an
enum without needing to make the cases, or the EnumInfo itself, a
subclass of SignlessIntegerAttrBase.

The existing classes are retained as subclasses of the new ones, both
for backwards compatibility and to allow attribute-specific information.

In addition, the new BitEnum class changes its default printer/parser
behavior: cases when multiple keywords appear, like having both nuw and
nsw in overflow flags, will no longer be quoted by the operator<<, and
the FieldParser instance will now expect multiple keywords. All
instances of BitEnumAttr retain the old behavior.
2025-03-27 19:40:06 -05:00
Michael Liao
52975d5c9f [mlir][scf] Allow different forwarding ordering in uplift
- Allow 'before' arguments are forwarded in different order to 'after'
  body when uplifting `scf.while` to `scf.for`.
2025-03-27 18:09:07 -04:00
AdityaK
f2849fe05f
Fix RemoveDeadValues: Bail out early when there are no terminators (#133316)
Fixes: #131765
2025-03-27 15:04:31 -07:00
Bruno Cardoso Lopes
7c3ecffe9b
[MLIR][LLVMIR] Add support for the full form of global_{ctor,dtor} (#133176)
Currently only ctor/dtor list and their priorities are supported. This
PR adds support for the missing data field.

Few implementation notes:
- The assembly printer has a fixed form because previous `attr_dict`
will sort the dict by key name, making global_dtor and global_ctor
differ in the order of printed arguments.
- LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd
have to create a region to use the default operation conversion from
`ptr null`, which is silly given that the field only support null or a
symbol.
2025-03-27 14:11:05 -07:00
egebeysel
a1a5594ad2
[mlir][arith] add wide integer emulation support for subi (#133248)
Adds wide integer emulation support for the `arith.subi` op. `(i2N, i2N)
-> (i2N)` ops are emulated as `(vector<2xiN>, vector<2xiN>) ->
(vector<2xiN>)`, just as the other emulation patterns.

The emulation uses the following scheme:

```
resLow = lhsLow - rhsLow;      // carry = 1 if rhsLow > lhsLow
resHigh = lhsLow - carry - rhsLow;
```

Signed-off-by: Ege Beysel <beysel@roofline.ai>
2025-03-27 15:01:04 -05:00
Fabian Mora
1a7af2a90f
[mlir][DataLayout] Add IsolatedFromAbove to DataLayoutOpInterface (#132742)
This patch adds the `IsolatedFromAbove` trait as a dependent trait to
the `DataLayoutOpInterface` op interface.

The motivation behind this change comes from the implementation of the
`ptr` dialect, specifically the `ptr.type_offset` op. This op produces
an int-like value that equates to the size of a memory element. This is
useful for ptr arithmetic and indexing arrays. For example:

```mlir
%f32_off = ptr.type_offset f32 : index
%addr = ptr.ptradd %ptr, %f32_off : !ptr, index
%x = ptr.load %addr : !ptr -> f32 // Read ptr[1]
```

Without the `IsolatedFromAvobe` trait in the DL interface, the
`ptr.type_offset` cannot be `ConstantLike`. Why?
Take the example:
```mlir
op {DL1} {
  %f32_off0 = ptr.type_offset f32 : index
  op {DL2} {
    %f32_off1 = ptr.type_offset f32 : index
  }
}
```
If `ptr.type_offset` were to be `ConstantLike` then `canonicalize` would
hoist and unique the value. However, that could be wrong as DL2 could
have an entry to specify the size that's different from the size in DL1.

The best solution to the above problem is to make
`DataLayoutOpInterface` require the `IsolatedFromAbove` trait, as it
preserves the constness of values in the DL with respect to the
canonicalizer.
2025-03-27 14:37:37 -04:00
Bruno Cardoso Lopes
08aedf7201
[MLIR][LLVM] Lift alignstack attribute ptr type restriction (#133195)
Current usage of alignstack is restricted to LLVM pointer types, whereas
when it's used in parameters it's possible to use it for other types,
see examples like `{i8, i8}, [2 x float], etc` in `llvm/test/CodeGen`.
This PR lifts the restriction and add testcases.
2025-03-27 10:28:37 -07:00
Guray Ozen
38d9a44510
[MLIR][NVGPU] Add tma.fence.descriptor OP (#133218)
When the TMA descriptor is transferred from host memory to global memory
using cudaMemcpy, each thread block must insert a fence before any
thread accesses the updated tensor map in global memory. Once the tensor
map has been accessed, no additional fences are needed by that block
unless the map is modified again.

[Example from cuda programming
guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#using-tma-to-transfer-multi-dimensional-arrays).
The `tma.fence.descriptor` basically implements
`ptx::fence_proxy_tensormap_generic`.
```
#include <cuda.h>
#include <cuda/ptx>
namespace ptx = cuda::ptx;

__device__ CUtensorMap global_tensor_map;
__global__ void kernel(CUtensorMap *tensor_map)
{
  // Fence acquire tensor map:
  ptx::n32_t<128> size_bytes;
  // Since the tensor map was modified from the host using cudaMemcpy,
  // the scope should be .sys.
  ptx::fence_proxy_tensormap_generic(
     ptx::sem_acquire, ptx::scope_sys, tensor_map, size_bytes
 );
 // Safe to use tensor_map after fence inside this thread..
}
int main() {
  CUtensorMap local_tensor_map;
  // [ ..Initialize map.. ]
  cudaMemcpy(&global_tensor_map, &local_tensor_map, sizeof(CUtensorMap), cudaMemcpyHostToDevice);
  kernel<<<1, 1>>>(global_tensor_map);
}
```
2025-03-27 15:20:19 +01:00
Guray Ozen
bc7e3915e1
[MLIR][NVGPU] Add mbarrier.get Op (#133221)
The `mbarrier.create` op can create multiple mbarrier objects, and other
mbarrier-related ops can access an mbarrier using a dynamic SSA value.
This is especially useful when using mbarriers in dynamic loops.

This PR adds the `mbarrier.get` op, which returns a pointer to a
specific mbarrier object from a group of barriers created by the
nvgpu.mbarrier.create operation. It is useful when composing the NVGPU
and NVVM dialects.

Example:
```
%mbars = nvgpu.mbarrier.create 
   -> !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>, num_barriers = 10>

%mbar_pointer = nvgpu.mbarrier.get %mbars[%c2] 
  : !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>> 
  -> i32
 ```
2025-03-27 15:20:07 +01:00
Longsheng Mou
ac09b789d8
[mlir][scf] Remove redundant ensureTerminator for scf.forall (#133081)
The override function `ensureTerminator` ensures that the terminator
`InParallelOp` has a region. However, if the terminator of `scf.forall`
is not an `InParallelOp`, calling ensureTerminator causes a crash. Since
the InParallelOp builder already guarantees the existence of a region,
`ForallOp::ensureTerminator` is redundant and can be safely removed.
Fixes #130019.
2025-03-27 20:07:20 +08:00
Letu Ren
9438694a54
[mlir][llvmir] Add llvm.intr.ldexp operation (#133070)
https://llvm.org/docs/LangRef.html#llvm-ldexp-intrinsic
2025-03-27 07:50:39 +01:00
Letu Ren
ad51368881
[mlir][llvmir] add llvm.experimental.constrained.sitofp intrinsics (#133166)
https://llvm.org/docs/LangRef.html#llvm-experimental-constrained-sitofp-intrinsic

Signed-off-by: Letu Ren <fantasquex@gmail.com>
2025-03-27 07:50:04 +01:00
Frank Schlimbach
9269aaecff
[mlir][mesh] fixes for 0d tensors (#132948)
In some cases 0d tensors have no sharding. This PR provides a few minor
fixes to account for such cases.
2025-03-26 18:13:41 +01:00
Guray Ozen
e8dfd70fe2
[MLIR][NVGPU] Use gpu.dynamic_shared_memory in tests (#133122)
Reland #133051
2025-03-26 18:00:22 +01:00
Karlo Basioli
3f82c3d5a8
Revert "[MLIR][NVGPU] Use gpu.dynamic_shared_memory in tests" (#133103)
Reverts llvm/llvm-project#133051 due to failing integration tests
2025-03-26 15:39:14 +00:00
Guray Ozen
15f5a7a3ec
[MLIR][NVGPU] Use gpu.dynamic_shared_memory in tests (#133051)
The `memref.subview` ops in the test case were incorrect: they extracted
out-of-bounds.
2025-03-26 14:32:04 +01:00
Longsheng Mou
894b27a746
[mlir][MemRefToLLVM] Fix crash with unconvertable memory space (#132323)
This PR adds handling when the `memref.alloca` with unconvertable memory
space to prevent a crash. Fixes #131439.
2025-03-26 16:51:26 +08:00
Longsheng Mou
73f487d31e
[mlir][TosaToLinalg] Fix bugs in PointwiseConverter (#132526) 2025-03-26 08:33:47 +00:00
Srinivasa Ravi
40815be30a
[MLIR][NVVM] Add support for st.bulk Op (#131727)
This change adds the `st.bulk` NVVM Op for the `st.bulk` instruction
introduced in ptx8.6 for sm_100.

PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk
2025-03-26 08:30:58 +05:30
Emilio Cota
2da4ce8624
Revert "[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled" (#133000)
Reverts llvm/llvm-project#128566. See as well the discussion in
llvm/llvm-project#132935.
2025-03-25 22:40:06 +00:00
Bruno Cardoso Lopes
e7e242e7ad
[MLIR][LLVM] Fix debug value/declare import in face of landing pads (#132871)
Debug value/declare operations imported before landing pad operations at
the bb start break invoke op verification:

```
error: first operation in unwind destination should be a llvm.landingpad operation
```

This this issue by making the placement slightly more smart.
2025-03-25 13:15:51 -07:00
Bruno Cardoso Lopes
74c2c049d1
[MLIR][LLVM] Add weak_odr to allowed linkage for alias (#132840)
I missed this when originally introduced the feature (note the verifier
message already contains it), this fixes a small bug.
2025-03-25 10:46:02 -07:00
Igor Wodiany
7b3885d47b
[mlir][spirv] Add definition for GL Fract (#132921) 2025-03-25 16:59:09 +00:00
Ian Tayler Lessa
5f58f3dda8
[mlir][tosa] Avoid overflow in reduction folders (#132786)
Avoid operations that can overflow in constant folders for
`tosa.reduce_max` and `tosa.reduce_min`

Includes tests to avoid regressions

Signed-off-by: Ian Tayler Lessa <ian.taylerlessa@arm.com>
2025-03-25 16:43:46 +00:00
Karlo Basioli
f6823a0ae1
Revert "[mlir][memref] Verify out-of-bounds access for memref.subview" (#132940)
Reverts llvm/llvm-project#131876

GPU integration tests get broken by this PR. 
E.x.
`mlir/test/Integration/GPU/CUDA/sm90/gemm_f32_f16_f16_128x128x128.mlir`
2025-03-25 14:56:08 +00:00
Ivan Butygin
9b022220b7
[mlir][vector] Propagate vector.extract through elementwise ops (#131462)
Propagate `Extract(Elementwise(...))` -> `Elemetwise(Extract...)`.

Currenly limited to the case when extract is the single use of
elementwise to avoid introducing additional elementwise ops.
2025-03-25 14:07:48 +03:00
Matthias Springer
d4304d85f2
[mlir][memref] Verify out-of-bounds access for memref.subview (#131876)
* Improve the verifier of `memref.subview` to detect out-of-bounds
extractions.
* Improve the documentation of `memref.subview` to make clear that
out-of-bounds extractions are not allowed. Rewrite examples to use the
new `strided<>` notation instead of `affine_map` layout maps. Also
remove all unrelated operations (`memref.alloc`) from the examples.
* Fix various test cases where `memref.subview` ops ran out-of-bounds.
* Update canonicalizations patterns to ensure that they do not fold IR
if it would generate IR that no longer verifies.

Related discussion on Discourse:
https://discourse.llvm.org/t/out-of-bounds-semantics-of-memref-subview/85293
2025-03-25 11:25:11 +01:00
Luke Hutton
d4570ea813
[mlir][tosa] Disallow invalid datatype combinations in the validation pass (#131595)
This commit checks if the operands/results of an operator can be found
in the profile compliance mapping, if it isn't the operator is
considered invalid. As a result, operator datatype combinations that are
not listed under the "Supported Data Types" of the TOSA specification
are disallowed and the validation pass results in failure.

Signed-off-by: Luke Hutton <luke.hutton@arm.com>
2025-03-25 10:05:39 +00:00
Georgios Pinitas
3df92197bb
[mlir][tosa] Support DenseResourceElementsAttr in TOSA transpose folders (#124532)
Handle dense resource attributes in the transpose TOSA folder.
Currently their interface does not align with the rest of the
`ElementsAttr` when it comes to data accessing hence the special
handling.

Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
2025-03-24 21:48:22 +00:00
Jerry-Ge
1b9d475453
[mlir][tosa] Align validation profiles and extensions to TOSA v1.0 spec (#132768)
- Add missing int16 extension for concat operator
- Remove int16 extension for cast operator
- Add pro_int and pro_fp profiles for const_shape operator

Signed-off-by: Jerry Ge <jerry.ge@arm.com>
2025-03-24 13:01:15 -07:00
MaheshRavishankar
e4172196a7
[mlir][TilingInterface] Make tileAndFuseConsumerOfSlice take surrounding loops as an argument. (#132082)
This gets the consumer fusion method in sync with the corresponding
producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as
input required use of complicated analysis to retrieve the surrounding
loops which are very fragile. Just like the producer fusion method, the
loops need to be taken in as an argument, with typically the loops being
created by the tiling methods.

Some utilities are added to check that the loops passed in are perfectly
nested (in the case of an `scf.for` loop nest.

This is change 1 of N to simplify the implementation of tile and fuse
consumers.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-03-24 11:41:26 -07:00
Bruno Cardoso Lopes
8a2a694438
[MLIR][LLVM] Support dso_local_equivalent constants (#132131)
Create a new operation `DSOLocalEquivalentOp`, following the steps of
other constants.

This is similar in a way to `AddressOfOp` but with specific semantics:
only support functions and function aliases (no globals) and extern_weak
linkage is not allowed.

An alternative approach is to use a new `UnitAttr` in `AddressOfOp` and
check that attribute to enforce specific semantics in the verifiers. The
drawback is going against what other constants do and having to add more
attributes in the future when we introduce `no_cfi`, `blockaddress`,
etc.

While here, improve the error message for other missing constants.
2025-03-24 10:43:53 -07:00
Kareem Ergawy
5c02f1a5af
[OpenMP][IRBuilder] De-duplicate code that emit task dependencies (#132340)
A small clean-up following up on #131795. Seems like we had 2 quite
similar implementations for the same thing: emit task dependencies
struct and filling it. This PR unifies the 2 versions into one. This is
better since we had to fix a bug in one of them in #131795 so this
applies the fix for both.
2025-03-24 16:04:20 +01:00
Igor Wodiany
3aa20c266c
[mlir][spirv] Add definition for selected sample operations (#129558)
This commit adds following three operations: ImageSampleImplicitLodOp,
ImageSampleExplicitLodOp and ImageSampleProjDrefImplicitLodOp
2025-03-24 13:56:36 +00:00
Matthias Springer
529ee3cf3b
[mlir][tensor] Fix slice canonicalizer for out-of-bounds cases (#132534)
Since #130487, `tensor.extract_slice` and `tensor.insert_slice` ops that
are statically detected to go out of bounds are rejected by the
verifier.

This commit fixes canonicalization patterns that currently fold
dynamically out-of-bounds ops (valid IR) to statically out-of-bounds ops
(invalid IR).
2025-03-24 14:39:37 +01:00
Kunwar Grover
f3fa54a191
[mlir][Vector] Handle 0-rank case in fold instead of RewriterPattern (#130168)
For vector.extract, the folder always canonicalizes to a vector.extract
operation, while the rewrite pattern canonicalizes to a vector.broadcast
except in the case of 0-rank vectors.

Remove this special casing, and instead handle the 0-rank vector case in
the folder.
2025-03-24 13:14:24 +00:00
Kunwar Grover
dc28e0d5d2
[mlir][Vector] Remove more special case uses for extractelement/insertelement (#130166)
A number of places in our codebase special case to use
extractelement/insertelement for 0D vectors, because extract/insert did
not support 0D vectors previously. Since insert/extract support 0D
vectors now, use them instead of special casing.
2025-03-24 13:04:16 +00:00
Kunwar Grover
24a8e18f5a
[mlir][vector] Allow multi dim vectors in vector.scatter (#132217)
This patch matches the definition of vector.scatter as a counter part of
vector.gather.

All of the changes done in this patch make vector.scatter match
vector.gather 's multi dimensional definition.

Unrolling for vector.scatter will be implemented in subsequent patches.

Discourse Discussion:
https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13
2025-03-24 12:52:46 +00:00
Kunwar Grover
cf0efb3188
[mlir][vector] Decouple unrolling gather and gather to llvm lowering (#132206)
This patch decouples unrolling vector.gather and lowering vector.gather
to llvm.masked.gather.

This is consistent with how vector.load, vector.store,
vector.maskedload, vector.maskedstore lower to LLVM.

Some interesting test changes from this patch:

- 2D vector.gather lowering to llvm tests are deleted. This is
consistent with other memory load/store ops.
- There are still tests for 2D vector.gather, but the constant mask for
these test is modified. This is because with the updated lowering, one
of the unrolled vector.gather disappears because it is masked off (also
demonstrating why this is a better lowering path)

Overall, this makes vector.gather take the same consistent path for
lowering to LLVM as other load/store ops.

Discourse Discussion:
https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13
2025-03-24 12:25:17 +00:00
Longsheng Mou
94783a8199
[mlir][mesh] Exit after signalPassFailure to fix a crash (#132662)
Fixes #131435.
2025-03-24 20:19:56 +08:00
Letu Ren
071643f339
[mlir][llvm] Add llvm.experimental.constrained.fpext operation (#129054)
Ref: https://github.com/llvm/llvm-project/pull/86260
2025-03-24 13:04:19 +01:00
Letu Ren
91140e6a51
[mlir][llvm] Add llvm.intr.exp10 operation (#129378) 2025-03-24 13:02:14 +01:00
Igor Wodiany
ef9c4f4f5c
[mlir][spirv] Update assembly format for Image operand types (#130758)
In the example below it is not clear that `(f32)` relates to `%arg2` and
not to `vector<2xf32>`:

```mlir
%0 = spirv.ImageSampleImplicitLod %arg0, %arg1 ["Lod"](%arg2) :
  !spirv.sampled_image<...>, vector<2xf32>(f32) -> vector<4xf32>
```

This change applies new format to image operations and image operands
that does not use parenthesis and is less ambiguous:

```mlir
%0 = spirv.ImageSampleImplicitLod %arg0, %arg1 ["Lod"], %arg2 :
  !spirv.sampled_image<...>, vector<2xf32>, f32 -> vector<4xf32>
```
2025-03-24 09:30:39 +00:00
Javed Absar
41f9a00818
[NFC][mlir][bufferization] (#132637) 2025-03-24 07:50:27 +00:00