12771 Commits

Author SHA1 Message Date
Igor Wodiany
ef9c4f4f5c
[mlir][spirv] Update assembly format for Image operand types (#130758)
In the example below it is not clear that `(f32)` relates to `%arg2` and
not to `vector<2xf32>`:

```mlir
%0 = spirv.ImageSampleImplicitLod %arg0, %arg1 ["Lod"](%arg2) :
  !spirv.sampled_image<...>, vector<2xf32>(f32) -> vector<4xf32>
```

This change applies new format to image operations and image operands
that does not use parenthesis and is less ambiguous:

```mlir
%0 = spirv.ImageSampleImplicitLod %arg0, %arg1 ["Lod"], %arg2 :
  !spirv.sampled_image<...>, vector<2xf32>, f32 -> vector<4xf32>
```
2025-03-24 09:30:39 +00:00
Javed Absar
41f9a00818
[NFC][mlir][bufferization] (#132637) 2025-03-24 07:50:27 +00:00
Austin
7a056895c5
[mlir] Fix typo of tests (NFC) (#132396) 2025-03-24 09:29:00 +08:00
Sandeep Dasgupta
81d7eef134
Sub-channel quantized type implementation (#120172)
This is an implementation for [RFC: Supporting Sub-Channel Quantization
in
MLIR](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694).

In order to make the review process easier, the PR has been divided into
the following commit labels:

1. **Add implementation for sub-channel type:** Includes the class
design for `UniformQuantizedSubChannelType`, printer/parser and bytecode
read/write support. The existing types (per-tensor and per-axis) are
unaltered.
2. **Add implementation for sub-channel type:** Lowering of
`quant.qcast` and `quant.dcast` operations to Linalg operations.
3. **Adding C/Python Apis:** We first define he C-APIs and build the
Python-APIs on top of those.
4. **Add pass to normalize generic ....:** This pass normalizes
sub-channel quantized types to per-tensor per-axis types, if possible.


A  design note:
- **Explicitly storing the `quantized_dimensions`, even when they can be
derived for ranked tensor.**
While it's possible to infer quantized dimensions from the static shape
of the scales (or zero-points) tensor for ranked
data tensors
([ref](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694/3)
for background), there are cases where this can lead to ambiguity and
issues with round-tripping.

```
Consider the example: tensor<2x4x!quant.uniform<i8:f32:{0:2, 0:2}, {{s00:z00, s01:z01}}>>
```

The shape of the scales tensor is [1, 2], which might suggest that only
axis 1 is quantized. While this inference is technically correct, as the
block size for axis 0 is a degenerate case (equal to the dimension
size), it can cause problems with round-tripping. Therefore, even for
ranked tensors, we are explicitly storing the quantized dimensions.
Suggestions welcome!


PS: I understand that the upcoming holidays may impact your schedule, so
please take your time with the review. There's no rush.
2025-03-23 07:37:55 -05:00
Han-Chung Wang
900be712ce
[mlir][Linalg] Preserve encodings in static shape inference. (#132311)
Previously, the encodings are unconditionally dropped during the shape
inference. The revision adds the support for preserving the encodings in
the linalg ops.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-03-21 13:36:44 -07:00
Jerry-Ge
4dd7feab2b
[mlir][tosa] Add more error_if checks for Resize Op (#132290)
Some of the error_if checks were missed in this PR:
https://github.com/llvm/llvm-project/pull/124956

Add back those tests to check suitable sizes for Resize

Signed-off-by: Luke Hutton <luke.hutton@arm.com>
Co-authored-by: Luke Hutton <luke.hutton@arm.com>
2025-03-21 09:16:05 -07:00
Yi Qian
0ea4fb9264
[AMD][ROCDL] Add packed conversions fp8/bf8->bf16 and fp8/bf8->fp32 in ROCDL dialect (#131850)
- Add packed conversions fp8/bf8->bf16 for gfx950 and fp8/bf8->fp32 for
gfx942 in ROCDL dialect
- Update amdgpu.ext_packed_fp8 lowering to use ROCDL packed fp8/bf8->f32
conversions for vector target types and ROCDL scalar fp8/bf8->fp32 for
scalar target type.

---------

Co-authored-by: Jungwook Park <jungwook.park@amd.com>
2025-03-21 14:49:50 +00:00
Vadim Curcă
6b59b33358
[MLIR] Handle call site locations when inlining (#132247)
When inlining a `callee` with a call site debug location, the inlining
infrastructure was trivially combining the `callee` and the `caller`
locations, forming a "tree" of call stacks. Because of this, the remarks
were printing an incomplete inlining stack.

This commit handles this case and appends the `caller` location at the
end of the `callee`'s stack, extending the chain.
2025-03-21 14:06:53 +01:00
Zhuoran Yin
ea03bdee70
[MLIR][AMDGPU] Adding Vector transfer_read to load rewrite pattern (#131803)
This PR adds the Vector transfer_read to load rewrite pattern. The
pattern creates a transfer read op lowering. A vector trasfer read op
will be lowered to a combination of `vector.load`, `arith.select` and
`vector.broadcast` if:
 - The transfer op is masked.
 - The memref is in buffer address space.
 - Other conditions introduced from `TransferReadToVectorLoadLowering`

The motivation of this PR is due to the lack of support of masked load
from amdgpu backend. `llvm.intr.masked.load` lower to a series of
conditional scalar loads refer to (`scalarize-masked-mem-intrin` pass).
This PR will make it possible for masked transfer_read to be lowered
towards buffer load with bounds check, allowing a more optimized global
load accessing pattern compared with existing implementation of
`llvm.intr.masked.load` on vectors.
2025-03-21 08:42:04 -04:00
Uday Bondhugula
2170d77e5d
[MLIR][Affine] Fix getAccessRelation for 0-d memrefs (#132347)
Fix getAccessRelation for 0-d memref accesses in certain cases. Fixes
corner case crashes when using scalrep, affine dep analysis, etc.

Fixes: https://github.com/llvm/llvm-project/issues/132163
2025-03-21 14:35:20 +05:30
Kazu Hirata
3041fa6c7a
[mlir] Use *Set::insert_range (NFC) (#132326)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-20 22:24:17 -07:00
Sergei Lebedev
c8a9a4109a
[MLIR] [python] A few improvements to the Python bindings (#131686)
* `PyRegionList` is now sliceable. The dialect bindings generator seems
to assume it is sliceable already (!), yet accessing e.g. `cases` on
`scf.IndexedSwitchOp` raises a `TypeError` at runtime.
* `PyBlockList` and `PyOperationList` support negative indexing. It is
common for containers to do that in Python, and most container in the
MLIR Python bindings already allow the index to be negative.
2025-03-21 00:13:13 -04:00
Kareem Ergawy
f23a6ef54c
[flang][OpenMP] Process omp.atomic.update while translating scopes for target device (#132165)
Fixes a bug introduced by
https://github.com/llvm/llvm-project/pull/130078.

For non-BlockArgOpenMPOpInterface ops, we also want to map their entry
block arguments to their operands, if any. For the current support in
the OpenMP dialect, the table below lists all ops that have arguments
(SSA operands and/or attributes) and not target-related. Of all these
ops, we need to only process `omp.atomic.update` since it is the only op
that has SSA operands & an attached region. Therefore, the region's
entry block arguments must be mapped to the op's operands in case they
are referenced inside the region.


| op | operands? | region(s)? | parent is func? | processed? |

|--------------|-------------|------------|------------------|-------------|
| atomic.read | yes | no | yes | no |
| atomic.write | yes | no | yes | no |
| atomic.update | yes | yes | yes | yes |
| critical | no | no | yes | no |
| declare_mapper | no | yes | no | no |
| declare_reduction | no | yes | no | no |
| flush | yes | no | yes | no |
| private | no | yes | yes | no |
| threadprivate | yes | no | yes | no |
| yield | yes | no | yes | no |
2025-03-20 16:21:09 -05:00
Longsheng Mou
7c11d053f6
[mlir][TosaToLinalg] Exit after notifyMatchFailure (#132012)
This PR adds `return nullptr` when the shift value of `tosa.mul` is not
constant to prevent a crash. Fixes #131766.
2025-03-20 19:17:04 +00:00
Sergio Afonso
b231f6f862
[MLIR][OpenMP] Improve omp.map.info verification (#132066)
This patch makes the `map_type` and `map_capture_type` arguments of the
`omp.map.info` operation required, which was already an invariant being
verified by its users via `verifyMapClause()`. This makes it clearer, as
getters no longer return misleading `std::optional` values.

Checks for the `mapper_id` argument are moved to a verifier for the
operation, rather than being checked by users.

Functionally NFC, but not marked as such due to a reordering of
arguments in the assembly format of `omp.map.info`.
2025-03-20 15:48:45 +00:00
Longsheng Mou
ead2724600
[mlir][scf] Fix a div-by-zero bug when step of scf.for is zero (#131079)
Fixes #130095.
2025-03-20 09:23:59 +08:00
Matthias Springer
a810141281
[mlir][memref] Add runtime verification for memref.assume_alignment (#130412)
Implement runtime verification for `memref.assume_alignment`.
2025-03-19 21:23:40 +01:00
Joel Wee
0260bcb5f4 Fix after f3dcc0f 2025-03-19 17:37:51 +00:00
TatWai Chong
1456eabca8
[mlir][tosa] Finalize profile-based validation for TOSA v1.0 (#131208)
- When the operand type of an operation changes to a profile-dependent
type, the compliance metadata must be updated. Update compliance check
for the following:
- CONV2D, CONV3D, DEPTHWISE_CONV2D, and TRANSPOSE_CONV2D, as zero points
have changed to variable inputs.
  - PAD, because pad_const has been changed to a variable input.
  - GATHER and SCATTER, as indices has changed to index_t.
- Add an int16 extension check for CONCAT.
- Add a compliance check for COND_IF, WHILE_LOOP, VARIABLE,
VARIABLE_READ, and VARIABLE_WRITE.
- Correct the profile requirements for IDENTITY, TABLE, MATMUL and
LOGICAL-like operations.
- Remove unnecessary checks for non-v1.0 operations.
- Add condition requirements (anyOf and allOf) to the type mode of
metadata for modes that have multiple profile/extension considerations.
2025-03-19 08:57:22 -07:00
Fabian Mora
2b8f887915
[mlir][Ptr] Add the MemorySpaceAttrInterface interface and dependencies. (#86870)
This patch introduces the `MemorySpaceAttrInterface` interface. This
interface is responsible for handling the semantics of `ptr` operations.

For example, this interface can be used to create read-only memory
spaces, making any other operation other than a load a verification
error, see `TestConstMemorySpaceAttr` for a possible implementation of
this concept.

This patch also introduces Enum dependencies `AtomicOrdering`, and
`AtomicBinOp`, both enumerations are clones of the Enums with the same
name in the LLVM Dialect.

Also, see:
- [[RFC] `ptr` dialect & modularizing ptr ops in the LLVM
dialect](https://discourse.llvm.org/t/rfc-ptr-dialect-modularizing-ptr-ops-in-the-llvm-dialect/75142)
for rationale.
- https://github.com/llvm/llvm-project/pull/73057 for a prototype
implementation of the full change.

**Note: Ignore the first commit, that's being reviewed in
https://github.com/llvm/llvm-project/pull/86860 .**
2025-03-19 10:55:24 -04:00
Kareem Ergawy
e737b846b4
[flang][OpenMP] Translate OpenMP scopes when compiling for target device (#130078)
If a `target` directive is nested in a host OpenMP directive (e.g.
parallel, task, or a worksharing loop), flang currently crashes if the
target directive-related MLIR ops (e.g. `omp.map.bounds` and
`omp.map.info` depends on SSA values defined inside the parent host
OpenMP directives/ops.

This PR tries to solve this problem by treating these parent OpenMP ops
as "SSA scopes". Whenever we are translating for the device, instead of
completely translating host ops, we just tranlate their MLIR ops as pure
SSA values.
2025-03-19 08:26:19 +01:00
Kareem Ergawy
e295f5dd11
[OpenMP][IRBuilder] Don't initialize kmp_dep_info instances in alloc regions (#131795)
Fixes #121289

Given the following MLIR, where a variable: `x`, is `private` for the
`omp.parallel` op and a `depend` for the `omp.task` op:
```mlir
omp.private {type = private} @_QFEx_private_i32 : i32
llvm.func @nested_task_with_deps() {
  %0 = llvm.mlir.constant(1 : i64) : i64
  %1 = llvm.alloca %0 x i32 {bindc_name = "x"} : (i64) -> !llvm.ptr
  omp.parallel private(@_QFEx_private_i32 %1 -> %arg0 : !llvm.ptr) {
    omp.task depend(taskdependout -> %arg0 : !llvm.ptr) {
      omp.terminator
    }
    omp.terminator
  }
  llvm.return
}
```

Before the fix proposed by this PR, the IR builder would emit the
allocation and the initialzation logic for the task's depedency info
struct in the parent function's alloc region:
```llvm
define void @nested_task_with_deps() {
  ....
  %.dep.arr.addr = alloca [1 x %struct.kmp_dep_info], align 8
  %2 = getelementptr inbounds [1 x %struct.kmp_dep_info], ptr %.dep.arr.addr, i64 0, i64 0
  %3 = getelementptr inbounds nuw %struct.kmp_dep_info, ptr %2, i32 0, i32 0
  %4 = ptrtoint ptr %omp.private.alloc to i64
  store i64 %4, ptr %3, align 4
  ....
  br label %entry

omp.par.entry:                                    ; preds = %entry
  ....
  %omp.private.alloc = alloca i32, align 4
```

Note the following:
- The private value `x` is alloced where it should be in the parallel
op's entry region,
- howerver, since the privae value is also a depedency of the task and
since allocation and initialzation of the task depedency info struct
both happen in the alloc region,
- this results in the private value being referenced before it is
actually defined.

This PR fixes the issue by only allocating the task depedency info in
the alloc region while initialzation happens in the current IP of the
function with the rest of the logic that depends on it.
2025-03-19 08:19:58 +01:00
Longsheng Mou
c34dc9a0cf
[mlir][SCFToEmitC] Don't convert unsupported types in EmitC (#131786)
This PR adds check for unsupported types in emitc, which fixes a crash.
Fixes #131442.
2025-03-19 14:18:51 +08:00
Longsheng Mou
efc31ecd27
[mlir][LICM] Restrict LICM to pure tensor semantics (#129673)
This PR fixes a bug where LICM incorrectly allowed buffer semantics,
which could lead to a crash. Fixes #129416.
2025-03-19 14:17:49 +08:00
Longsheng Mou
fbc1038f0c
[mlir][TosaToLinalg] Only support ranked tensor for reduce and gather (#131805)
This PR adds checks for ranked tensors in converter of reduce and gather
to prevent crash. Fixes #131087.
2025-03-19 09:18:52 +08:00
Ian Wood
fbbb33f400
[mlir] Fix crash when verifying linalg.transpose (#131733)
Adds checks in `isPermutationVector` for indices that are outside of the
bounds and removes the assert.

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
2025-03-18 12:33:27 -07:00
Srinivasa Ravi
c42952a782
[MLIR][NVVM] Add support for match.sync Op (#130718)
This change adds the `match.sync` Op to the MLIR NVVM dialect to
generate the `match.sync` PTX instruction.

PTX Spec Reference:

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-match-sync
2025-03-18 14:54:24 +05:30
Kareem Ergawy
49b8d8472f
[OpenMP][MLIR] Support LLVM translation for distribute with delayed privatization (#131564)
Adds support for tranlating delayed privatization (`private` and
`firstprivate`) for `omp.distribute` ops.
2025-03-18 10:14:42 +01:00
Matthias Springer
e614e840bc
[mlir][memref] Add runtime verification for memref.dim (#130410)
Add runtime verification for `memref.dim`: check that the index is in
bounds.

Also simplify the pass pipeline for all memref runtime verification
checks.
2025-03-18 09:10:49 +01:00
Longsheng Mou
4cb1430c1c
[mlir][spirv] Fix a crash in spirv::ISubOp::fold (#131570)
This PR fixes a crash if `spirv.ISub` is not integer type. Fixes
#131283.
2025-03-18 09:18:49 +08:00
William Moses
d9c65af626
[MLIR][GPUToNVVM] Support 32-bit isfinite (#131699)
Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>
2025-03-18 02:11:38 +01:00
Johannes de Fine Licht
c3f750250a
[MLIR][LLVM] Handle floats in Mem2Reg of memset intrinsics (#131621)
This was lacking a bitcast from the shifted integer type into a float.
Other non-struct types than integers and floats will still not be
Mem2Reg'ed.

Also adds special handling for constants to be emitted as a constant
directly rather than relying on followup canonicalization patterns
(`memset` of zero is a case that can appear in the wild).
2025-03-17 22:31:28 +01:00
Christian Ulmann
800593a014
[MLIR][LLVM] Avoid duplicated module flags in the export (#131627)
This commit resolves an issue in the LLVMIR export that caused the
duplication of the "Debug Info Version" module flag, when it was already
in MLIR.
2025-03-17 17:43:15 +01:00
Zhuoran Yin
1e89a76a04
[MLIR] Refactor to create vectorization convOp precondition check (#130181)
In corner situations, the vectorization pass may face to lower a conv2d
op and assert in a completely irrelevant location in
vectorizeConvolution() subroutine.

~~This PR rejects the conv2d op early and make the asserted routine to
return failure as a defensive workaround.~~

In addressing this, the PR moved all condition check away from the
`Conv1dGenerator` into the `convOpPreconditionCheck()` function. This
makes the unsupported ops such as conv2d to be rejected early and leave
a cleaner `Conv1dGenerator` constructor.
2025-03-17 09:32:45 -04:00
Luke Hutton
0c34d7a9e7
[mlir][tosa] Require operand/result tensors of at least rank 1 for some operations (#131335)
This commit updates the following operations (operands/results) to be of
at least rank 1 such that it aligns with the expectations of the
specification:
- ARGMAX (input)
- REDUCE_ALL (input/output)
- REDUCE_ANY (input/output)
- REDUCE_MAX (input/output)
- REDUCE_MIN (input/output)
- REDUCE_PRODUCT (input/output)
- REDUCE_SUM (input/output)
- CONCAT (each input in input1/output)
- PAD (input1/output)
- REVERSE (input1/output)
- SLICE (input1/output)
- TILE (input1/output)
- TRANSPOSE (input1/output)

In addition to this change, PAD has been updated to allow unranked
tensors for input1/output, inline with other operations.
2025-03-17 10:22:52 +00:00
Matthias Springer
6c867e27a7
[mlir] Use getSingleElement/hasSingleElement in various places (#131460)
This is a code cleanup. Update a few places in MLIR that should use
`hasSingleElement`/`getSingleElement`.

Note: `hasSingleElement` is faster than `.getSize() == 1` when it is
used with linked lists etc.

Depends on #131508.
2025-03-17 07:43:18 +01:00
Ivan Butygin
7c98cddc5a
[mlir] Expose AffineExpr.shift_dims/shift_symbols through C and Python bindings (#131521) 2025-03-16 19:57:56 +03:00
Matthias Springer
6c2f8476e7
[mlir][Transforms] Dialect Conversion: Add 1:N support to remapInput (#131454)
This commit adds 1:N support to `SignatureConversion::remapInputs`. This
API allows users to replace a block argument with multiple replacement
values. (And the block argument is dropped.) The API already supported
"bbarg --> multiple bbargs" mappings, but "bbarg --> multiple SSA
values" was missing.

---------

Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
2025-03-15 18:33:06 +01:00
Bruno Cardoso Lopes
5265412c13
[MLIR][LLVMIR] Import: add flag to prefer using unregistered intrinsics (#130685)
Currently, there is no common mechanism for supported intrinsics to be
generically annotated with arg and ret attributes. Since there are many
supported intrinsics around different dialects, the amount of work to
teach all them about these attributes is not trivial (though it would be
nice in the long term).

This PR adds a new flag `-prefer-unregistered-intrinsics` that can be
used alongside `--import-llvm` to always use `llvm.intrinsic_call`
during import time (ignoring dialect hooks for custom intrinsic
support).

Using this flag allow us to roundtrip the LLVM IR while eliminating a
whole set of differences coming from lack of arg/ret attributes on
supported intrinsics.

Note `convertIntrinsic` has to be moved to an implementation file
because it queries into `moduleImport` state, which is a fwd declaration
in `LLVMImportInterface.h`
2025-03-14 18:04:32 -07:00
Bruno Cardoso Lopes
29a000023c
[MLIR][LLVMIR] Add module flags support (#130679)
Import and translation support.

Note that existing support (prior to this PR) already covers enough in
translation specifically to emit "Debug Info Version". Also, the debug
info version metadata is being emitted even though the imported IR has
no information and is showing up in some tests (will fix that in another
PR).

---------

Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
Co-authored-by: Henrich Lauko <xlauko@mail.muni.cz>
2025-03-14 18:03:36 -07:00
Luke Hutton
955c02dc9c
[mlir][tosa] Check for compile time constants in the validation pass (#131123)
This commit adds a concept of the 'dynamic' extension in the Dialect and
checks that compile time constant (CTC) operands for each operator are
constant if the dynamic extension is not loaded.

Operands labeled as CTC in the specification that are of tosa.shape
(shape_t in the specification) type are not checked as they are always
expected to be constant. This requirement is checked elsewhere in the
dialect.

Signed-off-by: Luke Hutton <luke.hutton@arm.com>
2025-03-14 12:45:01 -07:00
MaheshRavishankar
2490f7f076
[mlir][Linalg] Allow expand shape propagation across linalg ops with dynamic shapes. (#127943)
With `tensor.expand_shape` allowing expanding dynamic dimension into
multiple dynamic dimension, adapt the reshape propagation through
expansion to handle cases where one dynamic dimension is expanded into
multiple dynamic dimension.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-03-14 12:42:42 -07:00
Charitha Saumya
fd24805c8e
Reapply [mlir][xegpu] Add XeGPU subgroup map propagation analysis for XeGPU SIMT distribution. (#131380)
Originally introduced in #130240 and reverted in #131364 

Reproduced the issue locally in Linux by doing a shared lib build. Fixes
including adding the missing LINK_LIBS.

**Original commit message:**

This PR adds the SG map propagation step of the XeGPU SIMT distribution.
SG map propagation is a sparse backward dataflow analysis that propagate
the sg_map backward starting from the operands of certain operations
(DPAS, store etc.).

This is the first step of XeGPU subgroup distribution. This analysis
result is used to attach layout information to each XeGPU SIMD subgroup
op. The lowering patterns in XeGPUSubgroupDistribute will consume these
layout info to distribute SIMD ops into SIMT ops that work on work-item
level data fragments.

Summary of Lowering XeGPU SIMD -> SIMT
Subgroup map propagation (This PR)
Attach sg_map to each op in move all ops inside
gpu.warp_execute_on_lane0 region.
Distribute each op using sg_map
Additional legalization steps to align more with Xe HW.
2025-03-14 12:38:36 -07:00
Charitha Saumya
3fcd921aa4
Revert "[mlir][xegpu] Add XeGPU subgroup map propagation analysis for XeGPU SIMT distribution." (#131364)
Reverts llvm/llvm-project#130240
2025-03-14 10:36:58 -07:00
Charitha Saumya
5eb557774d
[mlir][xegpu] Add XeGPU subgroup map propagation analysis for XeGPU SIMT distribution. (#130240)
This PR adds the SG map propagation step of the XeGPU SIMT distribution.
SG map propagation is a sparse backward dataflow analysis that propagate
the sg_map backward starting from the operands of certain operations
(DPAS, store etc.).

This is the first step of XeGPU subgroup distribution. This analysis
result is used to attach layout information to each XeGPU SIMD subgroup
op. The lowering patterns in XeGPUSubgroupDistribute will consume these
layout info to distribute SIMD ops into SIMT ops that work on work-item
level data fragments.

### Summary of Lowering XeGPU SIMD -> SIMT

1. Subgroup map propagation (This PR)
2. Attach `sg_map` to each op in move all ops inside
`gpu.warp_execute_on_lane0` region.
3. Distribute each op using `sg_map`
4. Additional legalization steps to align more with Xe HW.
2025-03-14 10:21:22 -07:00
mihailo-stojanovic
fc8b2bf2f8
[MLIR][LLVM] Import dereferenceable metadata from LLVM IR (#130974)
Add support for importing `dereferenceable` and `dereferenceable_or_null` metadata into LLVM dialect. Add a new attribute which models these two metadata nodes and a new OpInterface.
2025-03-14 09:30:47 +01:00
Kai Sasaki
befa037c13
[mlir][affine] Guard invalid dim attribute in the test-reify-bound pass (#129013)
Computing the bound of affine op
(ValueBoundsConstraintSet::computeBound) crashes due to the invalid dim
value given to the op. It is necessary for the pass to check the dim
attribute not to be greater than the rank of the input type.

Fixes https://github.com/llvm/llvm-project/issues/128807
2025-03-14 08:09:01 +09:00
Luke Hutton
1c45514748
[mlir][tosa] Fix bug causing quantized pad const creation crash (#131125)
This commit ensures the storage type is retrieved correctly which fixes
a crash when creating a quantized pad const tensor.

Testing is completed via the `tosa-optional-decompositions` pass which
makes use of the `createPadConstTensor` function.

Also includes some cleanup.
2025-03-13 13:17:47 -07:00
Daniel Hernandez-Juarez
64f67f870d
[mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd for bf16 on gfx942 (#129029)
- Change to make sure architectures < gfx950 emulate bf16
buffer_atomic_fadd
- Add tests for bf16 buffer_atomic_fadd and architectures: gfx12, gfx942
and gfx950

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2025-03-13 14:30:45 -05:00
Nirvedh Meshram
ca4399036f
[mlir][linalg] Add FoldReshapeWithGenericOpByCollapsing pattern (#131029)
This pattern to bubble up collapse shapes was missing in
`populateFoldReshapeOpsByCollapsingPatterns` .

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
2025-03-13 14:22:50 -05:00