PR #126091 adds intrinsics for tcgen05
wait/fence/commit operations. This patch
adds NVVM Dialect Ops for them.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
getInput1Zp() returns an unsigned value which means in case of negative
zero point value the max intermediate value computation currently goes
wrong. Use getInput1ZpAttr() instead which returns an APInt and allows
easy sign extension to int64_t.
Fix private memref creation bug in affine fusion exposed in the case of
the same memref being loaded from/stored to in producer nest. Make the
private memref replacement sound.
Change affine fusion debug string to affine-fusion - more compact.
Fixes: https://github.com/llvm/llvm-project/issues/48703
Add support for single element vector{load|store} lowering to SPIR-V.
Since, SPIR-V converts single element vector to scalars, it needs
special attention for vector{load|store} lowering to spirv{load|store}.
Adds XeGPU tensor descriptor type verifier.
The type verifier covers general tensor descriptor invariants w.r.t. Xe
ISA semantics.
Related operation verifiers are updated to account for the new
descriptor checks and avoid duplication.
I'm seeing build errors in a downstream project using torch-mlir that
are fixed by this change. See
https://github.com/iree-org/iree/pull/19903#discussion_r1946899561 for
more context. The build error on MSVC is:
```
C:\home\runner\_work\iree\iree\third_party\llvm-project\mlir\include\mlir/Dialect/Tosa/Utils/ConversionUtils.h(148): error C2872: 'OpTrait': ambiguous symbol
C:\home\runner\_work\iree\iree\third_party\llvm-project\mlir\include\mlir/Dialect/Tosa/IR/TosaOps.h(49): note: could be 'mlir::OpTrait'
C:\home\runner\_work\iree\iree\third_party\torch-mlir\include\torch-mlir/Dialect/Torch/IR/TorchTraits.h(23): note: or 'mlir::torch::Torch::OpTrait'
C:\home\runner\_work\iree\iree\third_party\llvm-project\mlir\include\mlir/Dialect/Tosa/Utils/ConversionUtils.h(148): note: the template instantiation context (the oldest one first) is
C:\home\runner\_work\iree\iree\third_party\torch-mlir\lib\Conversion\TorchToTosa\TosaLegalizeCommon.cpp(126): note: see reference to function template instantiation 'TosaOp mlir::tosa::CreateOpAndInfer<mlir::tosa::MulOp,mlir::Value&,mlir::Value&,mlir::Value&>(mlir::PatternRewriter &,mlir::Location,mlir::Type,mlir::Value &,mlir::Value &,mlir::Value &)' being compiled
with
[
TosaOp=mlir::tosa::MulOp
]
C:\home\runner\_work\iree\iree\third_party\torch-mlir\include\torch-mlir/Conversion/TorchToTosa/TosaLegalizeUtils.h(83): note: see reference to function template instantiation 'TosaOp mlir::tosa::CreateOpAndInfer<TosaOp,mlir::Value&,mlir::Value&,mlir::Value&>(mlir::ImplicitLocOpBuilder &,mlir::Type,mlir::Value &,mlir::Value &,mlir::Value &)' being compiled
with
[
TosaOp=mlir::tosa::MulOp
]
C:\home\runner\_work\iree\iree\third_party\torch-mlir\include\torch-mlir/Conversion/TorchToTosa/TosaLegalizeUtils.h(76): note: see reference to function template instantiation 'TosaOp mlir::tosa::CreateOpAndInferShape<TosaOp,mlir::Value&,mlir::Value&,mlir::Value&>(mlir::ImplicitLocOpBuilder &,mlir::Type,mlir::Value &,mlir::Value &,mlir::Value &)' being compiled
with
[
TosaOp=mlir::tosa::MulOp
]
```
I think the torch-mlir code here is causing the issue, but I'm not sure
why builds only started failing now:
https://github.com/llvm/torch-mlir/blob/main/include/torch-mlir/Dialect/Torch/IR/TorchTraits.h.
Given that `mlir::OpTrait` already exists, torch-mlir should not be
creating an ambiguous symbol `mlir::torch::Torch::OpTrait`. So while a
better fix would be to the downstream project, being explicit here
doesn't seem that unreasonable to me.
This PR continues with the introduction of poison as initialization
vector, in this particular case, in LowerVectorBitCast,
LowerVectorBroadcast and LowerVectorTranspose.
This is the first PR that introduces `ub.poison` vectors as part of a
rewrite/conversion pattern in the Vector dialect. It replaces the
`arith.constant dense<0>` vector initialization for
`vector.insert_slice` ops with a poison vector.
This PR depends on all the previous PRs that introduced support for
poison in Vector operations such as `vector.shuffle`, `vector.extract`,
`vector.insert`, including ODS, canonicalization and lowering support.
This PR may improve end-to-end compilation time through LLVM, depending
on the workloads.
The shape operand is changed to input shape type since V1.0
Change-Id: I508cc1d67e9b017048b3f29fecf202cb7d707110
Co-authored-by: Won Jeon <won.jeon@arm.com>
This PR adds a folder for `vector.extract(ub.poison) -> ub.poison`. It
also replaces `create` with `createOrFold` insert/extract ops in vector
unroll and transpose lowering patterns to trigger the poison foldings
introduced recently.
This patch improves the GraphViz output of ViewOpGraph
(--view-op-graph).
- Switch to rectangular record-based nodes, inspired by a similar
visualization in [Glow](https://github.com/pytorch/glow). Rectangles
make more efficient use of space when printing text.
- Add input and output ports for each operand and result, and remove
edge labels.
- Switch to a muted color palette to reduce eye strain.
A new constraint is also added to restrict attributes values for SPIR-V
attributes. Ideally this should use `ConfinedAttr` with a custom
constraint directly on the operand, however it seems TableGen does not
allow using that with SPIR-V attributes. I suspect it is because SPIR-V
attributes do not derive from the generic MLIR attribute class -
TableGen complains about missing enum field.
* Remove duplicate `TypeOrContainer`. There is an identical class with
the same name: `TypeOrValueSemanticsContainer`.
* Remove `TypeOrContainerOfAnyRank` and use
`TypeOrValueSemanticsContainer` instead. `TypeOrContainerOfAnyRank` is
inconsistent with the other classes because it explicitly checks for
`VectorType` and `TensorType` instead of utilizing the value semantics
type trait.
* Remove `SignlessIntegerOrIndexLikeOfAnyRank` etc. and use
`SignlessIntegerOrIndexLike` instead. `SignlessIntegerOrIndexLike` etc.
already allow 0-d vectors, so there is no difference with
`SignlessIntegerOrIndexLikeOfAnyRank`.
Following https://github.com/llvm/llvm-project/issues/124308, this patch
reorganizes the `vector-to-llvm.mlir` tests by splitting them into two
categories:
- **Basic conversion tests**: Tests that only require
`populateVectorToLLVMConversionPatterns`, focusing on the minimal
conversion from Vector to LLVM. These have been moved to
`vector-to-llvm-interface.mlir`.
- **Full pass tests**: Tests that require the complete
`ConvertVectorToLLVMPass`, which includes
`populateVectorToLLVMConversionPatterns` along with additional
patterns. These remain in `vector-to-llvm.mlir`.
This reorganization clarifies test coverage and helps avoid unnecessary
duplication.
NOTE: This is merely moving tests around between two files and adds
some comments.
For context, recall that `tensor.insert_slice` is vectorised using the
`vector.transfer_read` + `vector.transfer_write` pair. An unmasked
example is shown below:
```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
%slice into %dest[0, %c2]
[5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>
// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad
: tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2]
: vector<8x1xi32>, tensor<5x3xi32>
```
This PR extends `vectorizeAsInsertSliceOp` to add masking support for
the `vector.transfer_write` operation. This complements the changes
in #122927, which introduced masking for the `vector.transfer_read`.
The config is currently not movable and because there are constructors
the default move won't be generated, which prevents it from being moved.
Also, it is not copyable because of the unique_ptr. This PR adds move
constructor to allow moving it.
Goals:
1. To add syntax and semantic to 'batch_matmul' without changing any of
the existing syntax expectations for current usage. batch_matmul is
still just batch_matmul.
2. Move the definition of batch_matmul from linalg OpDsl to tablegen ODS
infra.
Scope of this patch:
To expose broadcast and transpose semantics on the 'batch_matmul'.
The broadcast and transpose semantic are as follows:
By default, 'linalg.batch_matmul' behavior will remain as is. Broadcast
and Transpose semantics can be applied by specifying the explicit
attribute 'indexing_maps' as shown below. This is a list attribute, so
the list must include all the maps if specified.
Example Transpose:
```
linalg.batch_matmul indexing_maps = [
affine_map< (d0, d1, d2, d3) -> (d0, d3, d1)>, //transpose
affine_map< (d0, d1, d2, d3) -> (d0, d3, d2)>,
affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)>
]
ins (%arg0, %arg1: memref<2x5x3xf32>,memref<2x5x7xf32>)
outs (%arg2: memref<2x3x7xf32>)
```
Example Broadcast:
```
linalg.batch_matmul indexing_maps = [
affine_map< (d0, d1, d2, d3) -> (d3)>, //broadcast
affine_map< (d0, d1, d2, d3) -> (d0, d3, d2)>,
affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)>
]
ins (%arg0, %arg1: memref<5xf32>,memref<2x5x7xf32>)
outs (%arg2: memref<2x3x7xf32>)
```
Example Broadcast and transpose:
```
linalg.batch_matmul indexing_maps = [
affine_map< (d0, d1, d2, d3) -> (d1, d3)>, //broadcast
affine_map< (d0, d1, d2, d3) -> (d0, d2, d3)>, //transpose
affine_map< (d0, d1, d2, d3) -> (d0, d1, d2)>
]
ins (%arg0, %arg1: memref<3x5xf32>, memref<2x7x5xf32>)
outs (%arg2: memref<2x3x7xf32>)
```
RFCs and related PR:
https://discourse.llvm.org/t/rfc-linalg-opdsl-constant-list-attribute-definition/80149https://discourse.llvm.org/t/rfc-op-explosion-in-linalg/82863https://discourse.llvm.org/t/rfc-mlir-linalg-operation-tree/83586https://github.com/llvm/llvm-project/pull/115319
LLVM itself is generally moving away from using `undef` and towards
using `poison`, to the point of having a lint that caches new uses of
`undef` in tests.
In order to not trip the lint on new patterns and to conform to the
evolution of LLVM
- Rename valious ::undef() methods on StructBuilder subclasses to
::poison()
- Audit the uses of UndefOp in the MLIR libraries and replace almost all
of them with PoisonOp
The remaining uses of `undef` are initializing `uninitialized` memrefs,
explicit conversions to undef from SPIR-V, and a few cases in
AMDGPUToROCDL where usage like
%v = insertelement <M x iN> undef, iN %v, i32 0
%arg = bitcast <M x iN> %v to i(M * N)
is used to handle "i32" arguments that are are really packed vectors of
smaller types that won't always be fully initialized.
This is a follow-up to 5df62bdc9be9c258c5ac45c8093b71e23777fa0e. That
commit should not have needed to make the vector.insert and
vector.extract conversions to SPIR-V directly handle the static poison
index case, as there is a fold from those to ub.poison, and a conversion
pattern from ub.poison to spirv.Undef, however:
- The ub.poison fold result could not be materialized by the vector
dialect (fixed as of d13940ee263ff50b7a71e21424913cc0266bf9d4).
- The conversion pattern wasn't being populated in VectorToSPIRVPass,
which is used by the tests. This commit changes this.
- The ub.poison to spirv.Undef pattern rejected non-scalar types, which
prevented its use for vector results. It is unclear why this restriction
existed; a remark in D156163 said this was to avoid converting "user
types", but it is not obvious why these shouldn't be permitted (the
SPIR-V specification allows OpUndef for all types except OpTypeVoid).
This commit removes this restriction.
With these fixed, this commit removes the redundant static poison index
handling, and updates the tests.
There were a bunch of spots in ROCDL.td where we were defining our own
llvmBuilder call which could have been generated using the default
built-in one on LLVM_IntrOpBase.
This commit cleans up such usages in the interests of potentinally
enabling ROCDL import in the future and of making best practices more
obvious.
The one breaking change is renaming WaitcntOp to SWaitcntOp, which
should have minimal impact.
This Pull Request adds OpImageWrite as defined in section 3.52.10.
(Image Instructions). The tests in
`mlir/test/Target/SPIRV/image-ops.mlir` are also updated (and extended
with the new op), so they now pass validation with `spirv-val` after
serialization into SPIR-V. The test was missing `ImageQuery` capability
and entry points. For entry points dummy `main` functions were added.
Summary:
This patch unifies the existing offloading entires into a single section
called `llvm_offload_entires`. This lets us use a more unified
offloading infrastructure so that all targets share the same handling.
The effect is that people in the runtimes now need to check if the kind
is what they expect, but the expectation is that you can combine
multiple potential providers into a compile job. Doesn't fully work
yet because of other runtime issues, but some day. Mostly this helps the
future of liboffload where we want to handle different languages than
OpenMP.
Reverts llvm/llvm-project#124402
It breaks an integration test in downstream project (i.e., IREE), which
produces NANs. Talked to the author @ita9naiwa, and we agree to reland
the PR after we find the issue.
This PR changes the emitted block structure of alloc, init, and copy
regions for `omp.private` and `omp.declare_reduction` ops a little bit.
In particular, this decouples init and copy regions from the alloca
insertion-point. The main motivation is fix "Instruction does not
dominate all uses!" errors that happen specially when an init region
uses a value from the OpenMP region it is being inlined into. The issue
happens because, previous to this PR, we inline the init region right
after the latest alloc block (since we used the alloca IP); which in
some cases (see exmaple below), is too early and causes the use
dominance issue.
Example that would break without this PR (when delayed privatization is
enabled for `omp.wsloop`s):
```fortran
subroutine test2 (xyz)
integer :: i
integer :: xyz(:)
!$omp target map(from:xyz)
!$omp do private(xyz)
do i = 1, 10
xyz(i) = i
end do
!$omp end target
end subroutine
```
This is PR 2 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.
**CHANGE 1**
Renames the variable "scale". Note, "scale" could mean either:
* "container-elements-per-emulated-type", or
* "emulated-elements-per-container-type".
While from the context it is clear that it's always the former (original
type is always a sub-byte type and the emulated type is usually `i8`),
this PR reduces the cognitive load by making this clear.
**CHANGE 2**
Replaces `isUnalignedEmulation` with `isFullyAligned`
Note, `isUnalignedEmulation` is always computed following a
"per-element-alignment" condition:
```cpp
// Check per-element alignment.
if (containerBits % emulatedBits != 0) {
return rewriter.notifyMatchFailure(
op, "impossible to pack emulated elements into container elements "
"(bit-wise misalignment)");
}
// (...)
bool isUnalignedEmulation = origElements % emulatedPerContainerElem != 0;
```
Given that `isUnalignedEmulation` captures only one of two conditions
required for "full alignment", it should be re-named as
`isPartiallyUnalignedEmulation`. Instead, I've flipped the condition and
renamed it as `isFullyAligned`:
```cpp
bool isFullyAligned = origElements % emulatedPerContainerElem == 0;
```
**CHANGE 3**
* Unifies various comments throughout the file (for consistency).
* Adds new comments throughout the file and adds TODOs where high-level
comments are missing.
**GitHub issue to track this work**:
https://github.com/llvm/llvm-project/issues/123630
The newly introduced `TensorRelayoutOpInterface` is created specifically
for `tensor.pack` + `tensor.unpack`. Although the interface is
currently empty, it enables us to refactor the logic in
`FoldTensorCastProducerOp` within the Tensor dialect as follows:
```cpp
// OLD
// Reject tensor::PackOp - there's dedicated pattern for that instead.
if (!foldTensorCastPrecondition(op) ||
isa<tensor::PackOp, tensor::UnPackOp>(*op))
return failure();
```
is replaced with:
```cpp
// NEW
// Reject tensor::PackOp - there's dedicated pattern for that instead.
if (!foldTensorCastPrecondition(op) ||
isa<tensor::RelayoutOpInterface>(*op))
return failure();
```
This will be crucial once `tensor.pack` + `tensor.pack` are replaced
with `linalg.pack` + `linalg.unpack` (i.e. moved to Linalg):
* https://github.com/llvm/llvm-project/pull/123902,
* https://discourse.llvm.org/t/rfc-move-tensor-pack-and-tensor-unpack-into-linalg/.
Note that the interface itself will later be moved to the Linalg
dialect. This decoupling ensures that the Tensor dialect does not
require an understanding of Linalg ops, thus keeping the dependency
lightweight.
This PR is effectively a preparatory step for moving PackOp and UnpackOp
to Linalg. Once that's completed, most CMake changes from this PR will
be effectively reverted.
When creating `EnumDecl`s from DWARF for Objective-C `NS_ENUM`s, the
Swift compiler tries to figure out if it should perform "swiftification"
of that enum (which involves renaming the enumerator cases, etc.). The
heuristics by which it determines whether we want to swiftify an enum is
by checking the `enum_extensibility` attribute (because that's what
`NS_ENUM` pretty much are). Currently LLDB fails to attach the
`EnumExtensibilityAttr` to `EnumDecl`s it creates (because there's not
enough info in DWARF to derive it), which means we have to fall back to
re-building Swift modules on-the-fly, slowing down expression evaluation
substantially. This happens around
4b3931c8ce/lib/ClangImporter/ImportEnumInfo.cpp (L37-L59)
To speed up Swift exression evaluation, this patch proposes encoding the
C/C++/Objective-C `enum_extensibility` attribute in DWARF via a new
`DW_AT_APPLE_ENUM_KIND`. This would currently be only used from the LLDB
Swift plugin. But may be of interest to other language plugins as well
(though I haven't come up with a concrete use-case for it outside of
Swift).
I'm open to naming suggestions of the various new attributes/attribute
constants proposed here. I tried to be as generic as possible if we
wanted to extend it to other kinds of enum properties (e.g., flag
enums).
The new attribute would look as follows:
```
DW_TAG_enumeration_type
DW_AT_type (0x0000003a "unsigned int")
DW_AT_APPLE_enum_kind (DW_APPLE_ENUM_KIND_Closed)
DW_AT_name ("ClosedEnum")
DW_AT_byte_size (0x04)
DW_AT_decl_file ("enum.c")
DW_AT_decl_line (23)
DW_TAG_enumeration_type
DW_AT_type (0x0000003a "unsigned int")
DW_AT_APPLE_enum_kind (DW_APPLE_ENUM_KIND_Open)
DW_AT_name ("OpenEnum")
DW_AT_byte_size (0x04)
DW_AT_decl_file ("enum.c")
DW_AT_decl_line (27)
```
Absence of the attribute means the extensibility of the enum is unknown
and abides by whatever the language rules of that CU dictate.
This does feel like a big hammer for quite a specific use-case, so I'm
happy to discuss alternatives.
Alternatives considered:
* Re-using an existing DWARF attribute to express extensibility. E.g., a
`DW_TAG_enumeration_type` could have a `DW_AT_count` or
`DW_AT_upper_bound` indicating the number of enumerators, which could
imply closed-ness. I felt like a dedicated attribute (which could be
generalized further) seemed more applicable. But I'm open to re-using
existing attributes.
* Encoding the entire attribute string (i.e., `DW_TAG_LLVM_annotation
("enum_extensibility((open))")`) on the `DW_TAG_enumeration_type`. Then
in LLDB somehow parse that out into a `EnumExtensibilityAttr`. I haven't
found a great API in Clang to parse arbitrary strings into AST nodes
(the ones I've found required fully formed C++ constructs). Though if
someone knows of a good way to do this, happy to consider that too.
This includes support for module translation, module import and add tests for both.
Fix https://github.com/llvm/llvm-project/issues/115390
ClangIR cannot currently lower global aliases to LLVM because of missing support for this.
Fixes#125088.
When splitBB is called with createBranch=true, it creates a branch
instruction in the old block. But no debug loc is set on that branch
instruction. If that is used as InsertPoint in the restoreIP, it has the
potential to set the current debug location to null and subsequent
instruction will come out without a debug location. This caused the
verification check to fail as shown in the bug report.
This PR changes splitBB and spliceBB function to also take a debugLoc
parameter which can be used to set the debug location of the branch
instruction.
Create `VectorToLLVMDialectInterface` which allows automatic conversion
discovery by generic `--convert-to-llvm` pass. This only covers final
dialect conversion step and not any previous preparation steps. Also,
currently there is no way to pass any additional parameters through this
conversion interface, but most users using default parameters anyway.
For extremely large models, it may be inefficient to load the model into
memory in Python prior to passing it to the MLIR C APIs for
deserialization. This change adds an API to parse a ModuleOp directly
from a file path.
If the large element limit is specified, large elements are hidden from
the asm but large resources are not. This change extends the large
elements limit to apply to printed resources as well.
Another follow up fix to
https://github.com/llvm/llvm-project/pull/123910 to fix a build failure
that sometimes happens in shared library builds:
https://lab.llvm.org/buildbot/#/builders/50/builds/9724
In file included from
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Transforms/TestInlining.cpp:16:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Transforms/../Dialect/Test/TestOps.h:148:10:
fatal error: 'TestOps.h.inc' file not found
148 | #include "TestOps.h.inc"
| ^~~~~~~~~~~~~~~
1 error generated.
This is same as PR #125106 which somehow is stuck in a "Processing
Update" loop for many hours now. I am going to close that one and push
this one instead.
While working on https://github.com/llvm/llvm-project/issues/125088, I
noticed a problem with the TargetBodyGenCallbackTy and
TargetGenArgAccessorsCallbackTy. The OMPIRBuilder and MLIR side Both
maintain their own IRBuilder and when control goes from one to other, we
have to take care to not use a stale debug location. The code currently
rely on restoreIP to set the insertion point and the debug location. But
if the passes InsertPointTy has an empty block, then the debug location
will not be updated (see SetInsertPoint). This can cause invalid debug
location to be attached to instruction and the verifier will complain.
Similarly when we exit the callback, the debug location of the Builder
is not set to what it was before the callback. This again can cause
verification failures.
This PR resets the debug location at the start and also uses an
InsertPointGuard to restore the debug location at exit.
Both of these problems would have been caught by the unit tests but they
were not setting the debug location of the builder before calling the
createTarget so the problem was hidden. I have updated the tests
accordingly.
This modifies the conversion patterns so that, in the case where the
index is known statically to be poison, the insertion/extraction is
replaced by an arbitrary junk constant value, and in the dynamic case,
the index is sanitized at runtime. This avoids triggering a UB in both
cases. The dynamic case is definitely a pessimisation of the generated
code, but the use of dynamic indexes is expected to be very rare and
already slow on real-world GPU compilers ingesting SPIR-V, so the impact
should be negligible.
Resolves#124162.
When building mlir with `-DMLIR_NVVM_EMBED_LIBDEVICE=ON`, there will be
a warning
```
build/tools/mlir/lib/Target/LLVM/libdevice_embedded.c:1: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘143’ to ‘-113’ [-Woverflow]
```
which is followed by a large number of characters in stdout.
Fix this to avoid stdout outputting a large number of characters (3e5).
Removed the TOSA quantization attribute used in various MLIR TOSA
dialect operations in favour of using builtin attributes.
Update any lit tests, conversions and transformations appropriately.
Signed-off-by: Tai Ly <tai.ly@arm.com>
Co-authored-by: Tai Ly <tai.ly@arm.com>