12848 Commits

Author SHA1 Message Date
Nirvedh Meshram
42b3f91fd6
[mlir] Vectorize tensor.pad with low padding for unit dims (#133808)
We currently do not have masked vectorization support for tenor.pad with
low padding. However, we can allow this in the special case where the
result dimension after padding is a unit dim. The reason is when we
actually have a low pad on a unit dim, the input size of that dimension
will be (or should be for correct IR) dynamically zero and hence we will
create a zero mask which is correct. If the low pad is dynamically zero
then the lowering is correct as well.

---------

Signed-off-by: Nirvedh <nirvedh@gmail.com>
2025-04-02 16:32:36 -05:00
ofri frishman
6f1347d57b
[MLIR] Bubble up tensor.extract_slice through tensor.collapse_shape (#131982)
Add a pattern that bubbles up tensor.extract_slice through
tensor.collapse_shape.
The pattern is registered in a pattern population function that is used
by the transform op
transform.apply_patterns.tensor.bubble_up_extract_slice and by the
tranform op transform.structured.fuse as a cleanup pattern.
This pattern enables tiling and fusing op chains which contain
tensor.collapse_shape if added as a cleanup pattern of tile and fuse
utility.
Without this pattern that would not be possible, as
tensor.collapse_shape does not implement the tiling interface. This is
an additional pattern to the one added in PR #126898
2025-04-02 21:06:43 +01:00
AdityaK
340f06a8d4
Fix: bail out when divisor is zero (#133518)
Fixes: #131279
2025-04-02 09:44:18 -07:00
Fehr Mathieu
8b67f36258
[mlir] [arith] Fix ceildivsi lowering in arith-expand (#133774)
This fixes the current lowering of `arith.ceildivsi` in the arith-expand
pass, which was previously incorrect. The new version is based on the
lowering of `arith.floordivsi`, and will not introduce new undefined
behavior or poison during the lowering. It also replaces one division
with a multiplication.

The previous lowering of `ceildivsi(n, m)` was the following:
```
x = (m > 0) ? -1 : 1
(n*m>0) ? ((n+x) / m) + 1 : - (-n / m)
```

This caused two problems:
* In the case where `n` is INT_MIN and `m` is positive, the result would
be poison instead of an actual value
* In the case where `n` is INT_MAX and `m` is `-1`, this would trigger
undefined behavior, while the original code wouldn't. This is because
`n+x` would be equal to `INT_MIN` (`INT_MAX + 1`), so the `(n+x) / m`
division would overflow and trigger UB.
2025-04-02 17:26:58 +01:00
Shilei Tian
84cb08e118
[MLIR][AMDGPU] Bump to COV6 (#133849)
We already bump to COV6 by default in the front-end and backend. This PR
is for MLIR.

Note that COV6 requires ROCm 6.3+.
2025-04-02 12:14:24 -04:00
Igor Wodiany
2a90631841
[mlir][spirv] Allow yielding values from selection regions (#133702)
There are cases in SPIR-V shaders where values need to be yielded from
the selection region to make valid MLIR. For example (part of the SPIR-V
shader decompiled to GLSL):

```
bool _115
if (_107)
{
    // ...
    float _200 = fma(...);
    // ...
    _115 = _200 < _174;
}
else
{
    _115 = _107;
}
bool _123;
if (_115)
{
    // ...
    float _213 = fma(...);
    // ...
    _123 = _213 < _174;
}
else
{
    _123 = _115;
}
````

This patch extends `mlir.selection` so it can return values.
`mlir.merge` is used as a "yield" operation. This allows to maintain a
compatibility with code that does not yield any values, as well as, to
maintain an assumption that `mlir.merge` is the only operation in the
merge block of the selection region.
2025-04-02 14:35:22 +01:00
Longsheng Mou
7d441d9892
[mlir] Use dyn_cast instead of cast in MathToVCIX conversion (#134047)
Fixes #131093.
2025-04-02 18:30:42 +08:00
donald chen
d40bab359c
[mlir][liveness] fix bugs in liveness analysis (#133416)
This patch fixes the following bugs:
- In SparseBackwardAnalysis, the setToExitState function should
propagate changes if it modifies the lattice. Previously, this issue was
masked because multi-block scenarios were not tested, and the traversal
order of backward data flow analysis starts from the end of the program.
- The method in liveness analysis for determining whether the
non-forwarded operand in branch/region branch operations is live is
incorrect, which may cause originally live variables to be marked as not
live.
2025-04-02 11:56:13 +08:00
AdityaK
96d60c00e5
[mlir][spirv] Verify matching of entry block arguments and function signature (#133167)
Fixes: #132894
2025-04-01 14:10:17 -07:00
Max191
1407f5bee9
[mlir] Canonicalize extract_slice(unpack) (#133777)
Canonicalizes a chain of `linalg.unpack -> tensor.extract_slice` into a
`linalg.unpack` with reduced dest sizes. This will only happen when the
unpack op's only user is a non rank-reducing slice with zero offset and
unit strides.

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Signed-off-by: Max Dawkins <maxdawkins19@gmail.com>
Co-authored-by: Max Dawkins <maxdawkins19@gmail.com>
2025-04-01 14:51:58 -04:00
Krzysztof Drewniak
25622aa745
[mlir][AMDGPU] Add gfx950 MFMAs to the amdgpu.mfma op (#133553)
This commit extends the lowering of amdgpu.mfma to handle the new
double-rate MFMAs in gfx950 and adds tests for these operations.

It also adds support for MFMAs on small floats (f6 and f4), which are
implented using the "scaled" MFMA intrinsic with a scale value of 0 in
order to have an unscaled MFMA.

This commit does not add a `amdgpu.scaled_mfma` operation, as that is
future work.

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2025-04-01 11:59:09 -05:00
Ivan Butygin
1f194ff34e
[mlir] Expose simplifyAffineExpr through python api (#133926) 2025-04-01 19:28:53 +03:00
Jean-Didier PAILLEUX
15cfe4a774
[MLIR] Adding 'no_inline' and 'always_inline' attributes on LLMV::CallOp (#133726)
Addition of `no_inline` and `always_inline` attributes for CallOps in
MLIR in order to be able to inline or not directly the call of a
function without having the attribute on the `FuncOp`.
The addition of these attributes will be used in a future PR in Flang
(`[NO]INLINE` directive).
2025-04-01 15:48:25 +02:00
Jeremy Morse
1ebc308bba
[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855)
During the transition from debug intrinsics to debug records, we used
several different command line options to customise handling: the
printing of debug records to bitcode and textual could be independent of
how the debug-info was represented inside a module, whether the
autoupgrader ran could be customised. This was all valuable during
development, but now that totally removing debug intrinsics is coming
up, this patch removes those options in favour of a single flag
(experimental-debuginfo-iterators), which enables autoupgrade, in-memory
debug records, and debug record printing to bitcode and textual IR.

We need to do this ahead of removing the
experimental-debuginfo-iterators flag, to reduce the amount of
test-juggling that happens at that time.

There are quite a number of weird test behaviours related to this --
some of which I simply delete in this commit. Things like
print-non-instruction-debug-info.ll , the test suite now checks for
debug records in all tests, and we don't want to check we can print as
intrinsics. Or the update_test_checks tests -- these are duplicated with
write-experimental-debuginfo=false to ensure file writing for intrinsics
is correct, but that's something we're imminently going to delete.

A short survey of curious test changes:
* free-intrinsics.ll: we don't need to test that debug-info is a zero
cost intrinsic, because we won't be using intrinsics in the future.
* undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory
mode while we sorted something out; it works now either way.
* salvage-cast-debug-info.ll: was testing intrinsics-in-memory get
salvaged, isn't necessary now
* localize-constexpr-debuginfo.ll: was producing "dead metadata"
intrinsics for optimised-out variable values, dbg-records takes the
(correct) representation of poison/undef as an operand. Looks like we
didn't update this in the past to avoid spurious test differences.
* Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing
that debug-info affected codegen, and we deferred updating the tests
until now. This is just one of those silent gnochange issues that get
fixed by RemoveDIs.

Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc,
that checks we can autoupgrade debug intrinsics that are in bitcode into
the new debug records.
2025-04-01 14:27:11 +01:00
Pablo Antonio Martinez
a338f80ddc
[mlir][Linalg] Add transform to convert linalg.copy into memref.copy (#132422)
Targeted rewrite of a linalg.copy on memrefs to a memref.copy.

This is useful when bufferizing copies to a linalg.copy, applying some
transformations, and then rewriting the copy into a memref.copy.
If the element types of the source and destination differ, or if the
source is a scalar, the transform produces a silenceable failure.
2025-04-01 13:39:33 +01:00
Longsheng Mou
1cf6786e32
[mlir] Improve error handling for dense attribute parsing in complex types (#133220)
- For splat dense attributes, the number of parsed elements must be 2.
- For non-splat dense attributes, the number of parsed elements must be
twice the number of elements in the type.

Fixes #132859.
2025-04-01 16:27:43 +08:00
Frank Schlimbach
49f080afc4
[mlir][mpi] Mandatory Communicator (#133280)
This is replacing #125361
- communicator is mandatory
- new mpi.comm_world
- new mp.comm_split
- lowering and test

---------

Co-authored-by: Sergio Sánchez Ramírez <sergio.sanchez.ramirez+git@bsc.es>
2025-04-01 08:58:55 +02:00
Sandeep Dasgupta
baacd1287b
Fix printing of mlirUniformQuantizedSubChannelTypeGetNumBlockSizes in 32-bit machine. (#133763)
Fixes the issue reported in
https://github.com/llvm/llvm-project/pull/120172#issuecomment-2763212827

cc @mgorny
2025-03-31 16:45:43 -04:00
Matthias Springer
5edf127384
[mlir][memref] Verify out-of-bounds access for memref.subview (#133086)
* Improve the verifier of `memref.subview` to detect out-of-bounds
extractions.
* Improve the documentation of `memref.subview` to make clear that
out-of-bounds extractions are not allowed. Rewrite examples to use the
new `strided<>` notation instead of `affine_map` layout maps. Also
remove all unrelated operations (`memref.alloc`) from the examples.
* Fix various test cases where `memref.subview` ops ran out-of-bounds.
* Update canonicalizations patterns to ensure that they do not fold IR
if it would generate IR that no longer verifies.

Related discussion on Discourse:
https://discourse.llvm.org/t/out-of-bounds-semantics-of-memref-subview/85293

This is a re-upload of #131876, which was reverted due to failing GPU
tests. These tests were faulty and fixed in #133051.
2025-03-31 10:28:55 -07:00
Matthias Springer
8b06da1682
[mlir][memref] Improve runtime verification for memref.subview (#132545)
This commit addresses a TODO in the runtime verification of
`memref.subview`. Each dimension is now verified: the offset must be
in-bounds and the slice must not run out-of-bounds.

This commit aligns runtime verification with static op verification
(which was improved in #133086).
2025-03-31 10:24:30 -07:00
Han-Chung Wang
66b0b0466b
[MLIR][NFC] Fix incomplete boundary comments. (#133516)
I observed that we have the boundary comments in the codebase like:

```
//===----------------------------------------------------------------------===//
// ...
//===----------------------------------------------------------------------===//
```

I also observed that there are incomplete boundary comments. The
revision is generated by a script that completes the boundary comments.

```
//===----------------------------------------------------------------------===//
// ...

...
```

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-03-31 09:29:54 -07:00
Frank Schlimbach
1dee12531d
[mlir][mpi] Lowering MPI_Allreduce (#133133)
Lowering of mpi.all_reduce to LLVM function call
2025-03-31 12:51:45 +02:00
Longsheng Mou
1878259c39
[mlir][spirv] Update verifier for spirv.mlir.merge (#133427)
- Moves the verification logic to the `verifyRegions` method of the
parent operation.
- Fixes a crash during verification when the last block lacks a
terminator.

Fixes #132850.
2025-03-29 15:35:00 +08:00
Longsheng Mou
22a11be8ab
[mlir][vector] Fix parser of vector.contract (#133434)
This PR adds a check in the parser to prevent a crash when
`vector.contract` lacks the `iterator_types` attribute.
Fixes #132886.
2025-03-29 09:24:42 +08:00
Mogball
e29f228a07 actually fix the test 2025-03-28 16:24:52 -07:00
Mogball
934f2557ba [mlir][analysis] Commit missing check lines 2025-03-28 16:14:06 -07:00
Jeff Niu
5252bb1d74
[mlir] IntegerRangeAnalysis: return initialized state for noninteger values (#133541)
Otherwise, the state for noninteger values remains uninitialized,
causing the analysis to return bogus results.
2025-03-28 16:13:10 -07:00
Qinkun Bao
91d2ecf0d5
[NFC] Fix some typos in libc and mlir comments (#133374) 2025-03-28 15:52:37 -04:00
Ian Wood
77ba6918a1
[mlir][linalg] Fix FoldReshapeWithGenericOpByCollapsing insertion point (#133476)
Fixes dominance verifier error with
`FoldReshapeWithGenericOpByCollapsing` by setting the insertion point
after `producer`. The `tensor.collapse_shape` op only has a single
operand (`producer`) so it is safe to insert after the producer.

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
2025-03-28 12:38:39 -07:00
Matthias Springer
4abff4d7b2
[mlir][Transforms] Improve replaceOpWithMultiple API (#132608)
This commit adds an additional overload to `replaceOpWithMultiple` that
accepts additional container types. This has been brought up by users of
the new `replaceOpWithMultiple` API.

In particular, one missing container type was
`SmallVector<SmallVector<Value>>`. The "default" `ArrayRef<ValueRange>`
container type can lead to use-after-scope errors in cases such as:
```c++
// Compute the replacement value ranges. Some replacements are single
// values, some are value ranges.
SmallVector<ValueRange> repl;
repl.push_back(someValueRange);  // OK
for (...) {
  // push_back(Value) triggers an implicit conversion to ValueRange,
  // which does not own the range.
  repl.push_back(someValue);  // triggers use-after-scope later
}
rewriter.replaceOpWithMultiple(op, repl);
```

In this example, users should use `SmallVector<SmallVector<Value>>
repl;`.
2025-03-28 14:18:54 +01:00
Letu Ren
29cb00331f
[mlir][llvmir] add llvm.experimental.constrained.uitofp intrinsics (#133300)
https://llvm.org/docs/LangRef.html#llvm-experimental-constrained-uitofp-intrinsic

Signed-off-by: Letu Ren <fantasquex@gmail.com>
2025-03-28 08:47:36 +01:00
Letu Ren
68f71aae3b
[mlir][llvmir] add llvm.sincos intrinsics (#133311)
https://llvm.org/docs/LangRef.html#llvm-frexp-intrinsic

Signed-off-by: Letu Ren <fantasquex@gmail.com>
2025-03-28 08:45:46 +01:00
Jakub Kuderski
f359c0bde5
[mlir][arith] Trim trailing spaces in wide int emulation tests. NFC. (#133349)
Followup cleanup after https://github.com/llvm/llvm-project/pull/132375
and https://github.com/llvm/llvm-project/pull/133248
2025-03-27 22:36:05 -04:00
Longsheng Mou
a6cb5cc0f0
[mlir] Add nullptr checks in SparseElementsAttr parser (#133222)
This PR adds nullptr checks in the SparseElementsAttr parser to improve
robustness and prevent crashes. Fixes #132891.
2025-03-28 10:11:14 +08:00
egebeysel
3a3732c252
[mlir][arith] wide integer emulation support for fpto*i ops (#132375)
Adding wide integer emulation support for `arith.fpto*i` operations. As
the other emulated operations, the upper and lower `N` bits of the `i2N`
integer result are emitted separately.

For the unsigned case we use the following emulation

```c
// example is 64 -> 32 bit emulation, but the implementation is generalized to any 2N -> N case
const double TWO_POW_N = (uint_64_t(1) << N); // 2^N, N is the bitwidth of the widest int supported

// f is a floating-point value representing the input of the fptoui op.
uint32_t hi = (uint32_t)(f / TWO_POW_N);         // Truncates the division result
uint32_t lo = (uint32_t)(f - hi * TWO_POW_N);       // Subtracts to get the lower bits.
```

For the signed case, we defer the emulation of the absolute value to
`fptoui` and handle the sign:

```
fptosi(fp) = sign(fp) * fptoui(abs(fp))
```

The edge cases of `NaNs, +-inf` and overflows/underflows are undefined
behaviour and the resulting numbers are the combination of the lower
bitwidth UB values. These operations also propagate poison values.

Signed-off-by: Ege Beysel <beysel@roofline.ai>
2025-03-27 20:58:56 -04:00
Krzysztof Drewniak
d7c53a91c2
[mlir] Decouple enum generation from attributes, adding EnumInfo and EnumCase (#132148)
This commit pulls apart the inherent attribute dependence of classes
like EnumAttrInfo and EnumAttrCase, factoring them out into simpler
EnumCase and EnumInfo variants. This allows specifying the cases of an
enum without needing to make the cases, or the EnumInfo itself, a
subclass of SignlessIntegerAttrBase.

The existing classes are retained as subclasses of the new ones, both
for backwards compatibility and to allow attribute-specific information.

In addition, the new BitEnum class changes its default printer/parser
behavior: cases when multiple keywords appear, like having both nuw and
nsw in overflow flags, will no longer be quoted by the operator<<, and
the FieldParser instance will now expect multiple keywords. All
instances of BitEnumAttr retain the old behavior.
2025-03-27 19:40:06 -05:00
Michael Liao
52975d5c9f [mlir][scf] Allow different forwarding ordering in uplift
- Allow 'before' arguments are forwarded in different order to 'after'
  body when uplifting `scf.while` to `scf.for`.
2025-03-27 18:09:07 -04:00
AdityaK
f2849fe05f
Fix RemoveDeadValues: Bail out early when there are no terminators (#133316)
Fixes: #131765
2025-03-27 15:04:31 -07:00
Bruno Cardoso Lopes
7c3ecffe9b
[MLIR][LLVMIR] Add support for the full form of global_{ctor,dtor} (#133176)
Currently only ctor/dtor list and their priorities are supported. This
PR adds support for the missing data field.

Few implementation notes:
- The assembly printer has a fixed form because previous `attr_dict`
will sort the dict by key name, making global_dtor and global_ctor
differ in the order of printed arguments.
- LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd
have to create a region to use the default operation conversion from
`ptr null`, which is silly given that the field only support null or a
symbol.
2025-03-27 14:11:05 -07:00
egebeysel
a1a5594ad2
[mlir][arith] add wide integer emulation support for subi (#133248)
Adds wide integer emulation support for the `arith.subi` op. `(i2N, i2N)
-> (i2N)` ops are emulated as `(vector<2xiN>, vector<2xiN>) ->
(vector<2xiN>)`, just as the other emulation patterns.

The emulation uses the following scheme:

```
resLow = lhsLow - rhsLow;      // carry = 1 if rhsLow > lhsLow
resHigh = lhsLow - carry - rhsLow;
```

Signed-off-by: Ege Beysel <beysel@roofline.ai>
2025-03-27 15:01:04 -05:00
Fabian Mora
1a7af2a90f
[mlir][DataLayout] Add IsolatedFromAbove to DataLayoutOpInterface (#132742)
This patch adds the `IsolatedFromAbove` trait as a dependent trait to
the `DataLayoutOpInterface` op interface.

The motivation behind this change comes from the implementation of the
`ptr` dialect, specifically the `ptr.type_offset` op. This op produces
an int-like value that equates to the size of a memory element. This is
useful for ptr arithmetic and indexing arrays. For example:

```mlir
%f32_off = ptr.type_offset f32 : index
%addr = ptr.ptradd %ptr, %f32_off : !ptr, index
%x = ptr.load %addr : !ptr -> f32 // Read ptr[1]
```

Without the `IsolatedFromAvobe` trait in the DL interface, the
`ptr.type_offset` cannot be `ConstantLike`. Why?
Take the example:
```mlir
op {DL1} {
  %f32_off0 = ptr.type_offset f32 : index
  op {DL2} {
    %f32_off1 = ptr.type_offset f32 : index
  }
}
```
If `ptr.type_offset` were to be `ConstantLike` then `canonicalize` would
hoist and unique the value. However, that could be wrong as DL2 could
have an entry to specify the size that's different from the size in DL1.

The best solution to the above problem is to make
`DataLayoutOpInterface` require the `IsolatedFromAbove` trait, as it
preserves the constness of values in the DL with respect to the
canonicalizer.
2025-03-27 14:37:37 -04:00
Bruno Cardoso Lopes
08aedf7201
[MLIR][LLVM] Lift alignstack attribute ptr type restriction (#133195)
Current usage of alignstack is restricted to LLVM pointer types, whereas
when it's used in parameters it's possible to use it for other types,
see examples like `{i8, i8}, [2 x float], etc` in `llvm/test/CodeGen`.
This PR lifts the restriction and add testcases.
2025-03-27 10:28:37 -07:00
Guray Ozen
38d9a44510
[MLIR][NVGPU] Add tma.fence.descriptor OP (#133218)
When the TMA descriptor is transferred from host memory to global memory
using cudaMemcpy, each thread block must insert a fence before any
thread accesses the updated tensor map in global memory. Once the tensor
map has been accessed, no additional fences are needed by that block
unless the map is modified again.

[Example from cuda programming
guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#using-tma-to-transfer-multi-dimensional-arrays).
The `tma.fence.descriptor` basically implements
`ptx::fence_proxy_tensormap_generic`.
```
#include <cuda.h>
#include <cuda/ptx>
namespace ptx = cuda::ptx;

__device__ CUtensorMap global_tensor_map;
__global__ void kernel(CUtensorMap *tensor_map)
{
  // Fence acquire tensor map:
  ptx::n32_t<128> size_bytes;
  // Since the tensor map was modified from the host using cudaMemcpy,
  // the scope should be .sys.
  ptx::fence_proxy_tensormap_generic(
     ptx::sem_acquire, ptx::scope_sys, tensor_map, size_bytes
 );
 // Safe to use tensor_map after fence inside this thread..
}
int main() {
  CUtensorMap local_tensor_map;
  // [ ..Initialize map.. ]
  cudaMemcpy(&global_tensor_map, &local_tensor_map, sizeof(CUtensorMap), cudaMemcpyHostToDevice);
  kernel<<<1, 1>>>(global_tensor_map);
}
```
2025-03-27 15:20:19 +01:00
Guray Ozen
bc7e3915e1
[MLIR][NVGPU] Add mbarrier.get Op (#133221)
The `mbarrier.create` op can create multiple mbarrier objects, and other
mbarrier-related ops can access an mbarrier using a dynamic SSA value.
This is especially useful when using mbarriers in dynamic loops.

This PR adds the `mbarrier.get` op, which returns a pointer to a
specific mbarrier object from a group of barriers created by the
nvgpu.mbarrier.create operation. It is useful when composing the NVGPU
and NVVM dialects.

Example:
```
%mbars = nvgpu.mbarrier.create 
   -> !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>, num_barriers = 10>

%mbar_pointer = nvgpu.mbarrier.get %mbars[%c2] 
  : !nvgpu.mbarrier.group<memorySpace = #gpu.address_space<workgroup>> 
  -> i32
 ```
2025-03-27 15:20:07 +01:00
Longsheng Mou
ac09b789d8
[mlir][scf] Remove redundant ensureTerminator for scf.forall (#133081)
The override function `ensureTerminator` ensures that the terminator
`InParallelOp` has a region. However, if the terminator of `scf.forall`
is not an `InParallelOp`, calling ensureTerminator causes a crash. Since
the InParallelOp builder already guarantees the existence of a region,
`ForallOp::ensureTerminator` is redundant and can be safely removed.
Fixes #130019.
2025-03-27 20:07:20 +08:00
Letu Ren
9438694a54
[mlir][llvmir] Add llvm.intr.ldexp operation (#133070)
https://llvm.org/docs/LangRef.html#llvm-ldexp-intrinsic
2025-03-27 07:50:39 +01:00
Letu Ren
ad51368881
[mlir][llvmir] add llvm.experimental.constrained.sitofp intrinsics (#133166)
https://llvm.org/docs/LangRef.html#llvm-experimental-constrained-sitofp-intrinsic

Signed-off-by: Letu Ren <fantasquex@gmail.com>
2025-03-27 07:50:04 +01:00
Frank Schlimbach
9269aaecff
[mlir][mesh] fixes for 0d tensors (#132948)
In some cases 0d tensors have no sharding. This PR provides a few minor
fixes to account for such cases.
2025-03-26 18:13:41 +01:00
Guray Ozen
e8dfd70fe2
[MLIR][NVGPU] Use gpu.dynamic_shared_memory in tests (#133122)
Reland #133051
2025-03-26 18:00:22 +01:00
Karlo Basioli
3f82c3d5a8
Revert "[MLIR][NVGPU] Use gpu.dynamic_shared_memory in tests" (#133103)
Reverts llvm/llvm-project#133051 due to failing integration tests
2025-03-26 15:39:14 +00:00