22391 Commits

Author SHA1 Message Date
lorenzo chelini
57dc71352c
[MLIR][Bufferization] Retire enforce-aliasing-invariants (#130929)
Why? This option can lead to incorrect IR if used in isolation, for
example, consider the IR below:

```mlir
func.func @loop_with_aliasing(%arg0: tensor<5xf32>, %arg1: index, %arg2: index) -> tensor<5xf32> {
  %c1 = arith.constant 1 : index
  %cst = arith.constant 1.000000e+00 : f32
  %0 = tensor.empty() : tensor<5xf32>
  %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<5xf32>) -> tensor<5xf32>
  // The BufferizableOpInterface says that %2 alias with %arg0 or be a newly
  // allocated buffer
  %2 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (tensor<5xf32>) {
    scf.yield %1 : tensor<5xf32>
  }
  %cst_0 = arith.constant 1.000000e+00 : f32
  %inserted = tensor.insert %cst_0 into %1[%c1] : tensor<5xf32>
  return %2 : tensor<5xf32>
}
```

If we bufferize with: enforce-aliasing-invariants=false, we get:

```
func.func @loop_with_aliasing(%arg0: memref<5xf32, strided<[?], offset: ?>>, %arg1: index, %arg2: index) -> memref<5xf32, strided<[?], offset: ?>> {
  %c1 = arith.constant 1 : index
  %cst = arith.constant 1.000000e+00 : f32
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<5xf32>
  linalg.fill ins(%cst : f32) outs(%alloc : memref<5xf32>)
  %0 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (memref<5xf32, strided<[?], offset: ?>>) {
    %cast = memref.cast %alloc : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>>
    scf.yield %cast : memref<5xf32, strided<[?], offset: ?>>
  }
  %cst_0 = arith.constant 1.000000e+00 : f32
  memref.store %cst_0, %alloc[%c1] : memref<5xf32>
  return %0 : memref<5xf32, strided<[?], offset: ?>>
}
```
Which is not correct IR since the loop yields the allocation.

I am using this option. What do I need to do now?

If you are using this option in isolation, you are possibly generating
incorrect IR, so you need to revisit your bufferization strategy. If you
are using it together with `copyBeforeWrite,` you simply need to retire
the `enforceAliasingInvariants` option.

Co-authored-by: Matthias Springer <mspringer@nvidia.com>
2025-03-18 08:42:43 +01:00
Longsheng Mou
4cb1430c1c
[mlir][spirv] Fix a crash in spirv::ISubOp::fold (#131570)
This PR fixes a crash if `spirv.ISub` is not integer type. Fixes
#131283.
2025-03-18 09:18:49 +08:00
William Moses
d9c65af626
[MLIR][GPUToNVVM] Support 32-bit isfinite (#131699)
Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>
2025-03-18 02:11:38 +01:00
Johannes de Fine Licht
c3f750250a
[MLIR][LLVM] Handle floats in Mem2Reg of memset intrinsics (#131621)
This was lacking a bitcast from the shifted integer type into a float.
Other non-struct types than integers and floats will still not be
Mem2Reg'ed.

Also adds special handling for constants to be emitted as a constant
directly rather than relying on followup canonicalization patterns
(`memset` of zero is a case that can appear in the wild).
2025-03-17 22:31:28 +01:00
Christian Ulmann
800593a014
[MLIR][LLVM] Avoid duplicated module flags in the export (#131627)
This commit resolves an issue in the LLVMIR export that caused the
duplication of the "Debug Info Version" module flag, when it was already
in MLIR.
2025-03-17 17:43:15 +01:00
Zhuoran Yin
1e89a76a04
[MLIR] Refactor to create vectorization convOp precondition check (#130181)
In corner situations, the vectorization pass may face to lower a conv2d
op and assert in a completely irrelevant location in
vectorizeConvolution() subroutine.

~~This PR rejects the conv2d op early and make the asserted routine to
return failure as a defensive workaround.~~

In addressing this, the PR moved all condition check away from the
`Conv1dGenerator` into the `convOpPreconditionCheck()` function. This
makes the unsupported ops such as conv2d to be rejected early and leave
a cleaner `Conv1dGenerator` constructor.
2025-03-17 09:32:45 -04:00
Luke Hutton
0c34d7a9e7
[mlir][tosa] Require operand/result tensors of at least rank 1 for some operations (#131335)
This commit updates the following operations (operands/results) to be of
at least rank 1 such that it aligns with the expectations of the
specification:
- ARGMAX (input)
- REDUCE_ALL (input/output)
- REDUCE_ANY (input/output)
- REDUCE_MAX (input/output)
- REDUCE_MIN (input/output)
- REDUCE_PRODUCT (input/output)
- REDUCE_SUM (input/output)
- CONCAT (each input in input1/output)
- PAD (input1/output)
- REVERSE (input1/output)
- SLICE (input1/output)
- TILE (input1/output)
- TRANSPOSE (input1/output)

In addition to this change, PAD has been updated to allow unranked
tensors for input1/output, inline with other operations.
2025-03-17 10:22:52 +00:00
Matthias Springer
6c867e27a7
[mlir] Use getSingleElement/hasSingleElement in various places (#131460)
This is a code cleanup. Update a few places in MLIR that should use
`hasSingleElement`/`getSingleElement`.

Note: `hasSingleElement` is faster than `.getSize() == 1` when it is
used with linked lists etc.

Depends on #131508.
2025-03-17 07:43:18 +01:00
Ivan Butygin
7c98cddc5a
[mlir] Expose AffineExpr.shift_dims/shift_symbols through C and Python bindings (#131521) 2025-03-16 19:57:56 +03:00
Andrzej Warzyński
d928a671b8
[mlir][Vector] Refactor VectorEmulateNarrowType.cpp (#123529)
This is PR refactors `alignedConversionPrecondition` from
VectorEmulateNarrowType.cpp and adds new helper hooks.

**Update `alignedConversionPrecondition` (1)**

This method doesn't require the vector type for the "container" argument. The
underlying element type is sufficient. The corresponding argument has been
renamed as `containerTy` - this is meant as the multi-byte container element
type (`i8`, `i16`, `i32`, etc). With this change, the updated invocations of
`alignedConversionPrecondition` (in e.g. `RewriteAlignedSubByteIntExt`) make it
clear that the container element type is assumed to be `i8`.

**Update alignedConversionPrecondition (2):**

The final check in `alignedConversionPrecondition` has been replaced with a new
helper method, `isSubByteVecFittable`. This helper hook is now also re-used in
`ConvertVectorTransferRead` (to improve code re-use).

**Other updates**

Extended + unified comments.

**Implements**: https://github.com/llvm/llvm-project/issues/123630
2025-03-16 12:22:46 +00:00
Matthias Springer
6c2f8476e7
[mlir][Transforms] Dialect Conversion: Add 1:N support to remapInput (#131454)
This commit adds 1:N support to `SignatureConversion::remapInputs`. This
API allows users to replace a block argument with multiple replacement
values. (And the block argument is dropped.) The API already supported
"bbarg --> multiple bbargs" mappings, but "bbarg --> multiple SSA
values" was missing.

---------

Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
2025-03-15 18:33:06 +01:00
Bangtian Liu
d52ec1e9dd
[MLIR][NFC] fix msvc debug build errors (#131393)
We found the build broken using msvc debug build as below:
```
C:\Users\bangtliu\iree\third_party\llvm-project\llvm\include\llvm/ADT/SmallVector.h(1162): error C2338: static_assert failed: 'You are trying to use a default number of inlined elements for `SmallVector<T>` but `sizeof(T)` is really big! Please use an explicit number of inlined elements with `SmallVector<T, N>` to make sure you really want that much inline storage.'
C:\Users\bangtliu\iree\third_party\llvm-project\llvm\include\llvm/ADT/SmallVector.h(1162): note: the template instantiation context (the oldest one first) is
C:\Users\bangtliu\iree\third_party\llvm-project\llvm\include\llvm/ADT/SmallVector.h(1194): note: see reference to class template instantiation 'llvm::CalculateSmallVectorDefaultInlinedElements<T>' being compiled
        with
        [
            T=`anonymous-namespace'::LinalgOperandDef
        ]
C:\Users\bangtliu\iree\third_party\llvm-project\mlir\tools\mlir-linalg-ods-gen\mlir-linalg-ods-yaml-gen.cpp(120): error C2976: 'llvm::SmallVector': too few template arguments
C:\Users\bangtliu\iree\third_party\llvm-project\llvm\include\llvm/ADT/SmallVector.h(1195): note: see declaration of 'llvm::SmallVector'
[862/7776] Building CXX object llvm-project\lib\DebugInfo\DWARF\CMakeFiles\LLVMDebugInfoDWARF.dir\DWARFDebugLine.cpp.obj
ninja: build stopped: subcommand failed.
```

This PR is added to address this error.
2025-03-14 21:36:53 -04:00
Bruno Cardoso Lopes
5265412c13
[MLIR][LLVMIR] Import: add flag to prefer using unregistered intrinsics (#130685)
Currently, there is no common mechanism for supported intrinsics to be
generically annotated with arg and ret attributes. Since there are many
supported intrinsics around different dialects, the amount of work to
teach all them about these attributes is not trivial (though it would be
nice in the long term).

This PR adds a new flag `-prefer-unregistered-intrinsics` that can be
used alongside `--import-llvm` to always use `llvm.intrinsic_call`
during import time (ignoring dialect hooks for custom intrinsic
support).

Using this flag allow us to roundtrip the LLVM IR while eliminating a
whole set of differences coming from lack of arg/ret attributes on
supported intrinsics.

Note `convertIntrinsic` has to be moved to an implementation file
because it queries into `moduleImport` state, which is a fwd declaration
in `LLVMImportInterface.h`
2025-03-14 18:04:32 -07:00
Bruno Cardoso Lopes
29a000023c
[MLIR][LLVMIR] Add module flags support (#130679)
Import and translation support.

Note that existing support (prior to this PR) already covers enough in
translation specifically to emit "Debug Info Version". Also, the debug
info version metadata is being emitted even though the imported IR has
no information and is showing up in some tests (will fix that in another
PR).

---------

Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
Co-authored-by: Henrich Lauko <xlauko@mail.muni.cz>
2025-03-14 18:03:36 -07:00
Kazu Hirata
456963de96 [mlir] Fix warnings
This patch fixes:

  mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:54:3:
  error: definition of implicit copy assignment operator for 'Layout'
  is deprecated because it has a user-declared copy constructor
  [-Werror,-Wdeprecated-copy]

  mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:103:3:
  error: definition of implicit copy assignment operator for 'SGMap'
  is deprecated because it has a user-declared copy constructor
  [-Werror,-Wdeprecated-copy]
2025-03-14 13:28:58 -07:00
Luke Hutton
955c02dc9c
[mlir][tosa] Check for compile time constants in the validation pass (#131123)
This commit adds a concept of the 'dynamic' extension in the Dialect and
checks that compile time constant (CTC) operands for each operator are
constant if the dynamic extension is not loaded.

Operands labeled as CTC in the specification that are of tosa.shape
(shape_t in the specification) type are not checked as they are always
expected to be constant. This requirement is checked elsewhere in the
dialect.

Signed-off-by: Luke Hutton <luke.hutton@arm.com>
2025-03-14 12:45:01 -07:00
MaheshRavishankar
2490f7f076
[mlir][Linalg] Allow expand shape propagation across linalg ops with dynamic shapes. (#127943)
With `tensor.expand_shape` allowing expanding dynamic dimension into
multiple dynamic dimension, adapt the reshape propagation through
expansion to handle cases where one dynamic dimension is expanded into
multiple dynamic dimension.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-03-14 12:42:42 -07:00
Charitha Saumya
fd24805c8e
Reapply [mlir][xegpu] Add XeGPU subgroup map propagation analysis for XeGPU SIMT distribution. (#131380)
Originally introduced in #130240 and reverted in #131364 

Reproduced the issue locally in Linux by doing a shared lib build. Fixes
including adding the missing LINK_LIBS.

**Original commit message:**

This PR adds the SG map propagation step of the XeGPU SIMT distribution.
SG map propagation is a sparse backward dataflow analysis that propagate
the sg_map backward starting from the operands of certain operations
(DPAS, store etc.).

This is the first step of XeGPU subgroup distribution. This analysis
result is used to attach layout information to each XeGPU SIMD subgroup
op. The lowering patterns in XeGPUSubgroupDistribute will consume these
layout info to distribute SIMD ops into SIMT ops that work on work-item
level data fragments.

Summary of Lowering XeGPU SIMD -> SIMT
Subgroup map propagation (This PR)
Attach sg_map to each op in move all ops inside
gpu.warp_execute_on_lane0 region.
Distribute each op using sg_map
Additional legalization steps to align more with Xe HW.
2025-03-14 12:38:36 -07:00
Charitha Saumya
3fcd921aa4
Revert "[mlir][xegpu] Add XeGPU subgroup map propagation analysis for XeGPU SIMT distribution." (#131364)
Reverts llvm/llvm-project#130240
2025-03-14 10:36:58 -07:00
Charitha Saumya
5eb557774d
[mlir][xegpu] Add XeGPU subgroup map propagation analysis for XeGPU SIMT distribution. (#130240)
This PR adds the SG map propagation step of the XeGPU SIMT distribution.
SG map propagation is a sparse backward dataflow analysis that propagate
the sg_map backward starting from the operands of certain operations
(DPAS, store etc.).

This is the first step of XeGPU subgroup distribution. This analysis
result is used to attach layout information to each XeGPU SIMD subgroup
op. The lowering patterns in XeGPUSubgroupDistribute will consume these
layout info to distribute SIMD ops into SIMT ops that work on work-item
level data fragments.

### Summary of Lowering XeGPU SIMD -> SIMT

1. Subgroup map propagation (This PR)
2. Attach `sg_map` to each op in move all ops inside
`gpu.warp_execute_on_lane0` region.
3. Distribute each op using `sg_map`
4. Additional legalization steps to align more with Xe HW.
2025-03-14 10:21:22 -07:00
Nikolay Panchenko
3ac5d8da61
[mlir-lsp] Abstract input and output of the JSONTransport (#129320)
The patch abstracts sending and receiving json messages of
`JSONTransport` to allow custom implementation of them. For example, one
concrete implementation can use pipes without a need to convert file
descriptor to a `FILE` object.
2025-03-14 12:31:28 -04:00
Sergio Afonso
72b8744aa5
[MLIR][OpenMP] Reduce overhead of target compilation (#130945)
This patch avoids calling `TargetOp::getInnermostCapturedOmpOp` multiple
times during initialization of default and runtime target attributes in
MLIR to LLVM IR translation of `omp.target` operations. This is a
potentially expensive operation, so this change should help keep compile
times lower.
2025-03-14 15:18:32 +00:00
Peter Hawkins
244cf89f14
[mlir][python] Small optimization to mlirApiObjectToCapsule. (#131160)
Call nb::getattr(...) rather than using nb::hasattr() and .attr(). Saves
a Python string allocation and a dictionary lookup when using a recent
nanobind.

Optimization only, no changes in behavior expected.
2025-03-14 08:10:42 -07:00
mihailo-stojanovic
fc8b2bf2f8
[MLIR][LLVM] Import dereferenceable metadata from LLVM IR (#130974)
Add support for importing `dereferenceable` and `dereferenceable_or_null` metadata into LLVM dialect. Add a new attribute which models these two metadata nodes and a new OpInterface.
2025-03-14 09:30:47 +01:00
Kai Sasaki
befa037c13
[mlir][affine] Guard invalid dim attribute in the test-reify-bound pass (#129013)
Computing the bound of affine op
(ValueBoundsConstraintSet::computeBound) crashes due to the invalid dim
value given to the op. It is necessary for the pass to check the dim
attribute not to be greater than the rank of the input type.

Fixes https://github.com/llvm/llvm-project/issues/128807
2025-03-14 08:09:01 +09:00
Luke Hutton
1c45514748
[mlir][tosa] Fix bug causing quantized pad const creation crash (#131125)
This commit ensures the storage type is retrieved correctly which fixes
a crash when creating a quantized pad const tensor.

Testing is completed via the `tosa-optional-decompositions` pass which
makes use of the `createPadConstTensor` function.

Also includes some cleanup.
2025-03-13 13:17:47 -07:00
Daniel Hernandez-Juarez
64f67f870d
[mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd for bf16 on gfx942 (#129029)
- Change to make sure architectures < gfx950 emulate bf16
buffer_atomic_fadd
- Add tests for bf16 buffer_atomic_fadd and architectures: gfx12, gfx942
and gfx950

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2025-03-13 14:30:45 -05:00
Nirvedh Meshram
ca4399036f
[mlir][linalg] Add FoldReshapeWithGenericOpByCollapsing pattern (#131029)
This pattern to bubble up collapse shapes was missing in
`populateFoldReshapeOpsByCollapsingPatterns` .

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
2025-03-13 14:22:50 -05:00
Christopher Di Bella
933ecf5f30
[mlir] adds [[maybe_unused]] to variables that might not be used (#131184)
This should suppress an unused variable warning that was seemingly
pervasive.
2025-03-13 11:44:53 -07:00
Adam Siemieniuk
a16c225b40
[mlir][xegpu] Convert Vector contraction to XeGPU (#122115)
Adds pattern to lower vector.contract to XeGPU operation.
2025-03-13 19:41:53 +01:00
Uday Bondhugula
0ed5f9b22b [MLIR] NFC. Fix unused warning in affine loop utils 2025-03-13 20:56:35 +05:30
Kareem Ergawy
b003face11
[flang][OpenMP] Add OutlineableOpenMPOpInterface to omp.teams (#131109)
Given the following input:
```fortran
program rep_loopbind
  implicit none
  integer :: i
  real :: priv_val

  !$omp teams private(priv_val)
    !$omp distribute
    do i=1,1000
    end do
  !$omp end teams
end program
```
the `AllocaOpConversion` pattern in `FIRToLLVMLowering` would **move**
the private allocations that belong to the `teams` directive (i.e. the
allocations needed for the private copies of `priv_val` and the loop's
iteration variable) from the the `omp.teams` op to the outside scope.

This is not correct since these allocations should be eventually emitted
inside the outlined region for the `teams` directive. Without this fix,
these allocation would be emitted in the parent function (or the parent
scope whatever it is).
2025-03-13 16:03:19 +01:00
Michael Klemm
28ffa7f6a4
[flang][OpenMP] Fix missing missing inode issue (#130798)
When outlining an offload region, Flang creates a unique name by
querying an inode ID. However, when the name of the actual source file
does not match the logical file in a `#line` preprocessor directive,
code-gen was failing as it could not determine the inode ID. This PR
checks for this condition and if the logical file name does not exist,
the inode is replaced with a hash value created from the source code
itself.
2025-03-13 15:50:37 +01:00
Sergio Afonso
237a910819
[MLIR][OpenMP] Remove the ReductionClauseInterface, NFC (#130978)
This patch removes the `ReductionClauseInterface` and all definitions of
its associated `getAllReductionVars` method.

The method mandated by this interface is not used anywhere and the
conflicts its definition produces when multiple reduction clauses are
present in an operation result in a more convoluted operation
definition, so it seems better to remove it and only add something like
this if there's a clear advantage to it.
2025-03-13 14:50:23 +00:00
Sergio Afonso
6ff33edf4d
[MLIR][OpenMP] Minor improvements to BlockArgOpenMPOpInterface, NFC (#130789)
This patch introduces a use for the new `getBlockArgsPairs` to avoid
having to manually list each applicable clause.

Also, the `numClauseBlockArgs()` function is introduced, which
simplifies the implementation of the interface's verifier and enables
better memory handling within `getBlockArgsPairs`.
2025-03-13 14:48:19 +00:00
Artemiy Bulavin
0aa5ba43a0
[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled (#128566)
Currently, `DistinctAttr` uses an allocator wrapped in a
`ThreadLocalCache` to manage attribute storage allocations. This ensures
all allocations are freed when the allocator is destroyed.

However, this setup can cause use-after-free errors when
`mlir::PassManager` runs its passes on a separate thread as a result of
crash reproduction being enabled. Distinct attribute storages are
created in the child thread's local storage and freed once the thread
joins. Attempting to access these attributes after this can result in
segmentation faults, such as during printing or alias analysis.

Example: This invocation of `mlir-opt` demonstrates the segfault issue
due to distinct attributes being created in a child thread and their
storage being freed once the thread joins:
```
mlir-opt --mlir-pass-pipeline-crash-reproducer=. --test-distinct-attrs mlir/test/IR/test-builtin-distinct-attrs.mlir
```

This pull request changes the distinct attribute allocator to use
different allocators depending on whether or not threading is enabled
and whether or not the pass manager is running its passes in a separate
thread. If multithreading is disabled, a non thread-local allocator is
used. If threading remains enabled and the pass manager invokes its pass
pipelines in a child thread, then a non-thread local but synchronised
allocator is used. This ensures that the lifetime of allocated storage
persists beyond the lifetime of the child thread.

I have added two tests for the `-test-distinct-attrs` pass and the
`-enable-debug-info-on-llvm-scope` passes that run them with crash
reproduction enabled.
2025-03-13 15:00:39 +01:00
Ivan Butygin
02fae68a45
[mlir][vector] VectorLinearize: ub.poison support (#128612)
Unify `arith.constant` and `up.poison` using
`OpTraitConversionPattern<OpTrait::ConstantLike>`.
2025-03-13 14:18:21 +03:00
Ivan Butygin
786e70ff10
[mlir] Change TypeOrValueSemanticsContainer base from TypeConstraint to Type (#129433)
`Type` is derived from `TypeConstraint`. Using `Type` as base allows to
use `SignlessIntegerLike` and friends in `Variadic<>`.
2025-03-13 14:17:42 +03:00
Uday Bondhugula
55b806c2af
[MLIR][Affine] Fix affine data copy generation copy placement for missing memref definition check (#130750)
This was exposed with the test case previously added but when performing
generation with limited memory capacity.
2025-03-13 16:08:53 +05:30
Matthias Springer
59fd2878fc
[mlir][memref] Clean up load/store documentation (#130569)
Remove references to the Affine dialect. The documentation is outdated.
Separate `affine.load/store` ops have been added.

Also add documentation for `nontemporal`.
2025-03-13 09:42:14 +01:00
Pradeep Kumar
21cef8aa1c
[MLIR][NVVM] Add support for tcgen05.{ld, st} (#130728)
This commit adds support for tcgen05.{ld, st} to the NVVM Dialect with
tests under tcgen05-ld.mlir and tcgen05-st.mlir respectively
2025-03-13 12:07:04 +05:30
Christopher Bate
3438dfc7ff
[mlir][tensor] Fix bufferization interface for 'tensor.reshape' (#128590)
Previously, the BufferizableOpInterface implementation for
'tensor.reshape'
listed the 'shape' operand as an alias for the result tensor, causing
unnecessary conflicts with ops that "write" to the shape operand.
2025-03-12 22:19:01 -06:00
Javed Absar
ecf4d995f6
[mlir][linalg][elementwise] Fold transpose into new elementwise (#130207)
Fold transpose into new elementwise Op which has affine-map attached.
Will add broadcast folding in next diff.
2025-03-12 23:04:44 +00:00
Luke Hutton
c44c905174
[mlir][tosa] Add error if verification to pooling operators (#130052)
This commit adds the following checks to avg_pool2d and max_pool2d TOSA
operations:
- check kernel values are >= 1
- check stride values are >= 1
- check padding values are >= 0
- check padding values are less than kernel sizes
- check output shape matches the expected output shape

Signed-off-by: Luke Hutton <luke.hutton@arm.com>
2025-03-12 10:48:33 -07:00
Artemiy Bulavin
fc127ff53d
[mlir] Extract RHS rows once when lowering vector.contract to dot (#130130)
The `vector.contract` op on two matrices A and B will be lowered to
individual dot products of each row and column of A and B respectively.
The existing lowering will extract each column of B for each row of A,
which leads to multiple values in the IR representing the same columns
of B.

This PR makes changes to the `ContractOpToDotLowering` to make sure that
the columns of B are only ever extracted once, so then the SSA values
representing the extracted columns are then re-used in the IR for later
dot products.

I have updated the existing vector-contract-to-dot-transforms test.
2025-03-12 17:16:49 +00:00
Nikita Popov
f137c3d592
[TargetRegistry] Accept Triple in createTargetMachine() (NFC) (#130940)
This avoids doing a Triple -> std::string -> Triple round trip in lots
of places, now that the Module stores a Triple.
2025-03-12 17:35:09 +01:00
MaheshRavishankar
665299eb3e
[mlir][Transforms] Add a utility method to move value definitions. (#130874)
205c5325b3
added a transform utility that moved all SSA dependences of an operation
before an insertion point. Similar to that, this PR adds a transform
utility function, `moveValueDefinitions` to move the slice of operations
that define all values in a `ValueRange` before the insertion point.
While very similar to `moveOperationDependencies`, this method differs
in a few ways

1. When computing the backward slice since the start of the slice is
value, the slice computed needs to be inclusive.
2. The combined backward slice needs to be sorted topologically before
moving them to avoid SSA use-def violations while moving individual ops.

The PR also adds a new transform op to test this new utility function.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-03-12 08:30:43 -07:00
Sergio Afonso
032f83b743
[MLIR][OpenMP] Enable BlockArgOpenMPOpInterface accessing operands (#130769)
This patch makes additions to the `BlockArgOpenMPOpInterface` to
simplify its use by letting it handle the matching between operands and
their associated entry block arguments. Most significantly, the
following is now possible:

```c++
SmallVector<std::pair<Value, BlockArgument>> pairs;
cast<BlockArgOpenMPOpInterface>(op).getBlockArgsPairs(pairs);
for (auto [var, arg] : pairs) {
  // var points to the operand (outside value) and arg points to the entry
  // block argument associated to that value.
}
```

This is achieved by making the interface define and use `getXyzVars()`
methods, which by default return empty `OperandRange`s and are overriden
by getters automatically produced for the `Variadic<...> $xyz_vars`
tablegen argument of the corresponding clause. These definitions can
then be simplified, since they no longer need to manually define
`numXyzBlockArgs` functions as a result.

A side-effect of this is that all ops implementing this interface will
now publicly define `getXyzVars()` functions for all entry block
argument-generating clauses, even if they don't actually accept all
clauses. However, these would just return empty ranges, so it shouldn't
cause issues.

This change uncovered some incorrect definitions of class declarations
related to the `ReductionClauseInterface`, and the `OpenMP_DetachClause`
incorrectly implementing the `BlockArgOpenMPOpInterface`, so these
issues are also addressed.
2025-03-12 11:50:12 +00:00
Matthias Springer
418e07b7e6
[mlir][Tensor] Check for out-of-bounds slice in insert/extract_slice verifier (#130487)
Also fix test cases that had invalid ops.
2025-03-12 08:34:21 +01:00
Oleksandr "Alex" Zinenko
6981f7e92a
[mlir] account for explicit affine.parallel in parallelization (#130812)
Affine parallelization should take explicitly parallel loops into
account when computing loop depth for dependency analysis purposes. This
was previously not the case, potentially leading to loops incorrectly
being marked as parallel due to depth mismatch.
2025-03-11 20:53:50 -05:00