This PR adds default option below. The new options will come as default
to true and not change the original lowering behavior of pack and unpack
op.
- lowerPadLikeWithInsertSlice to packOp (with default = true)
- lowerUnpadLikeWithExtractSlice to unPackOp (with default = true)
The motivation of the PR is finer granular control of the lowering of
pack and unpack Ops. This is useful in particular when we want to
guarantee that there's no additional insertslice and extractslice that
interfere with tiling. With the original lowering pipeline, packOp and
unPackOp may be lowered to insertslice and extractslice when the high
dimensions are unit dimensions and no transpose is invovled. Under such
circumstances, such insert and extract slice ops will block
producer/consumer fusion tile + fuse transforms. With this PR, we will
be able to disable such lowering path and allow consumer fusion to go
through as expected.
Adds a new mlir-opt test-only pass, -test-vulkan-runner-pipeline, which
runs a set of passes needed for mlir-vulkan-runner, and removes them
from the runner. The tests are changed to invoke mlir-opt with this flag
before invoking the runner. The passes moved are ones concerned with
lowering of the device code prior to serialization to SPIR-V. This is an
incremental step towards moving the entire pipeline to mlir-opt, to
align with other runners (see #73457).
### Background
In `lib/Target/LLVM/NVVM/Target.cpp`, `NVPTXSerializer` compile PTX to
binary with two different flows controlled by
`MLIR_ENABLE_NVPTXCOMPILER`.
If building mlir with `-DMLIR_ENABLE_NVPTXCOMPILER=ON`, the flow does
not check if the target is `gpu::CompilationTarget::Fatbin`, and compile
PTX to cubin directly, which is not consistent with another flow.
### Implement
Use static [nvfatbin](https://docs.nvidia.com/cuda/nvfatbin/index.html)
library.
I have tested it locally, the two flows can return the same Fatbin
result after inputing the same `GpuModule`.
The acc data clause operations hold an operand named `varPtr`. This was
intended to hold a pointer to a variable - where the element type of
that pointer specifies the type of the variable. However, for both
memref and llvm dialects, this assumption is not true. This is because
memref element type for cases like memref<10xf32> is simply f32 and for
LLVM, after opaque pointers, the variable type is no longer recoverable.
Thus, introduce varType to ensure that appropriate semantics are kept.
Both the parser and printer for this new type attribute allow it to not
be specified in cases where a dialect's getElementType() applied to
`varPtr`'s type has a recoverable type. And more specifically, for FIR,
no changes are needed in the MLIR unit tests.
Lowering `math.powf` to `llvm.intr.powf` will result in `pow(x, 0) =
1`, even for `x=0`. When using the Math dialect expansion patterns,
`pow(0, 0)` will result in `-nan`, however, This change adds two
additional instructions to the lowering to ensure the `pow(x, 0)` case
lowers to to `1` regardless of the value of `x`.
Resolves https://github.com/llvm/llvm-project/issues/118945.
This reapplies PR #118463 after introducing a fix for a bug uncovere by
the test suite. The problem is that when the alloca block is terminated
with a conditional branch, this violates a pre-condition of
`allocatePrivateVars` (which assumes the alloca block has a single
successor). This new PR includes a test that reproduces the issue.
Extend MLIR to LLVM lowering by adding support for `omp.wsloop` for
delayed privatization. This also refactors a few bit of code to isolate
the logic needed for `firstprivate` initialization in a shared util that
can be used across constructs that need it. The same is done for
`dealloc` regions.
This PR adds translation support for task detach. Essentially, if the
`detach` clause is present on a task, emit a
`__kmpc_task_allow_completion_event` on it, and store its return (of
type `kmp_event_t*`) into the `event_handle`.
Fix typos in the tensor dialect documentation:
1. Typos/Copy-paste errors referencing invalid `memref` type for
`tensor.dim` op.
2. Miscellaneous typos across other tensor dialect ops.
* Adds extra comments to group Ops
* Unifies the test function naming, i.e.
* `@vector_{op_name}_{variant}` -> `@{op_name}_{variant}`
* Unifies input variable names (`%input` -> `%arg0`)
* Capitalises LIT variable names (e.g. `%[[insert]]` --> `%[[INSERT]]`)
* Moves `@step_scalable()` _below_ its "fixed-width" counterpart
(to follow the existing consistency within this file).
There's still some inconsistencies within this file - I'm happy to send
more updates if folks find it useful. But I'd definitely recommend
splitting across multiple PRs (otherwise it's hard to review).
This commit adds additional tests and documentation for
`DecomposeOuterUnitDimsUnPackOpPattern` to ensure symmetry with its
counterpart for `tensor.pack`, `DecomposeOuterUnitDimsPackOpPattern`.
The new tests aim to improve implementation, documentation, and test
coverage for tensor.unpack. They cover the following scenarios:
* Static tile sizes: A simple `tensor.unpack` case
(`@simple_unpack_static_tiles`).
* Dynamic tile size: `tensor.unpack` with a single dynamic tile size
(`@simple_unpack_dynamic_tile`).
* Transpose: `tensor.unpack` with dynamic tile size and transpose
(`@simple_unpack_dynamic_tile_transpose`), currently commented out due
to some missing logic (see below)
* Scalable tile size: `tensor.unpack` with a scalable inner tile size
(@simple_unpack_scalable_tile).
Notes:
The test `@simple_unpack_dynamic_tile_transpose` is commented out
because the logic for capturing dynamic sizes for `tensor::EmptyOp` when
some tile sizes are dynamic is incomplete. This missing functionality
will be addressed in a follow-up patch.
The mlir-tblgen tool prevents the parameter of the build() constructor
for the first default-valued attribute of an operation from having a
default value to avoid ambiguity with the corresponding build()
constructor taking unwrapped value. However it does so even when earlier
wrapped unwrappable attribute would lift the ambiguity. This commit
relax the logic accordingly, which allows to remove a manual constructor
in Arith dialect.
Adds a ValueBoundsOpInterface implementation for scf.forall ops. The
implementation supports bounding for both induction variables, results,
and block args of the forall op. Induction variables are given upper and
lower bounds based on the lower and upper loop bounds, and dimensions of
the results and init block arguments are constrained to be equal to the
matching dims of the shared_outs operand.
Signed-off-by: Max Dawkins <maxdawkins19@gmail.com>
Co-authored-by: Max Dawkins <maxdawkins19@gmail.com>
`arith.fptoui %arg0 : f32 to i16` was lowered to
```
%0 = emitc.cast %arg0 : f32 to ui32
emitc.cast %0 : ui32 to i16
```
and is now lowered to
```
%0 = emitc.cast %arg0 : f32 to ui16
emitc.cast %0 : ui16 to i16
```
Fixing RemoveDeadValues to properly remove arguments from
BranchOpInterface operations.
This is a follow-up for:
https://github.com/llvm/llvm-project/pull/117405
cc: @joker-eph @codemzs
---------
Co-authored-by: Renat Idrisov <parsifal-47@users.noreply.github.com>
Extend MLIR to LLVM lowering by adding support for `omp.wsloop` for
delayed privatization. This also refactors a few bit of code to isolate
the logic needed for `firstprivate` initialization in a shared util that
can be used across constructs that need it. The same is done for
`dealloc`
regions.
Parent PR: https://github.com/llvm/llvm-project/pull/118447. Only latest
commit is relevant for this PR.
This refactors the logic needed to emit init logic for reductions by
moving some duplicated code into a shared util. The logic for doing is
quite involved and is needed for any construct that has reductions.
Moreover, when a construct has both private and reduction clauses, both
sets of clauses need to cooperate with each other when emitting the
logic needed for allocation and initialization. Therefore, this PR
clearly sets the boundaries for the logic needed to initialize
reductions.
As a follow-on for #87986, moves the VectorType convenience wrappers
(`FixedVectorType` and `ScalableVectorType`) to BuiltinTypes.h. This
allows us to use the new wrappers in "CommonTypeConstraints.td".
The current implementation of lowering to llvm for vector.extract
incorrectly assumes that if the number of indices is zero, the operation
can be folded away. This PR removes this condition and relies on the
folder to do it instead.
This PR also unifies the logic for scalar extracts and slice extracts,
which as a side effect also enables vector.extract lowering for n-d
vector.extract with dynamic inner most dimension. (This was only
prevented by a conservative check in the old implementation)
Unsigned integer types are uncommon enough in MLIR that there is no
operation to cast a scalar from signless to unsigned and vice versa.
Currently tosa.rescale uses builtin.unrealized_conversion_cast which
does not lower. Instead, this commit introduces optional attributes to
indicate unsigned input or output, named similarly to those in the TOSA
specification. This is more in line with the rest of MLIR where specific
operations rather than values are signed/unsigned.
If using a variadic operand, the error message given if the number of
types and operands do not match would be along the lines of:
```
3 operands present, but expected 2
```
This error message is confusing for multiple reasons, particular for
beginners:
* If the intention is to have 3 operands, it does not point out why it
expects 2. The user may actually just want to add a type to the type
list
* It reads as if a verifier error rather than a parser error, giving the
impression the Op only supports 2 operands.
This PR attempts to improve the error message by first noting the issue
("number of operands and types mismatch") and mentioning how many
operands and types it received.
This PR fixes a bug in `RemoveDeadValues` where the
`FunctionOpInterface` does not have the `isDeclaration` method. As a
result, we should use the `isExternal` method instead. Fixes#116347.
The extract <-> shape cast folder was conservatively asserting and
failing on 0-d vectors. This pr fixes this.
This pr also adds more tests for 0d cases and updates related tests to
better reflect what they test.
This PR allows out-of-tree dialects to write Python dialect modules
using nanobind instead of pybind11.
It may make sense to migrate in-tree dialects and some of the ODS Python
infrastructure to nanobind, but that is a topic for a future change.
This PR makes the following changes:
* adds nanobind to the CMake and Bazel build systems. We also add
robin_map to the Bazel build, which is a dependency of nanobind.
* adds a PYTHON_BINDING_LIBRARY option to various CMake functions, such
as declare_mlir_python_extension, allowing users to select a Python
binding library.
* creates a fork of mlir/include/mlir/Bindings/Python/PybindAdaptors.h
named NanobindAdaptors.h. This plays the same role, using nanobind
instead of pybind11.
* splits CollectDiagnosticsToStringScope out of PybindAdaptors.h and
into a new header mlir/include/mlir/Bindings/Python/Diagnostics.h, since
it is code that is no way related to pybind11 or for that matter,
Python.
* changed the standalone Python extension example to have both pybind11
and nanobind variants.
* changed mlir/python/mlir/dialects/python_test.py to have both pybind11
and nanobind variants.
Notes:
* A slightly unfortunate thing that I needed to do in the CMake
integration was to use FindPython in addition to FindPython3, since
nanobind's CMake integration expects the Python_ names for variables.
Perhaps there's a better way to do this.
This change doesn't introduce any functional differences but aligns the
implementation more closely with LLVM's representation. Previously, the
code generated a lookup table to map MLIR enums to LLVM enums due to the
lack of one-to-one correspondence. With this refactoring, the generated
code now casts directly from one enum to another.
This PR adds two small convenience Vector types:
* `ScalableVectorType` and `FixedVectorType`.
The goal of these new types is two-fold:
* Enable idiomatic checks like `isa<ScalableVectorType>(...)`.
* Make the split into "Scalable" and "Fixed-wdith" vectors a bit more
explicit and more visible in the code-base.
The new types are added in mlir/include/mlir/IR (instead of e.g.
mlir/include/mlir/Dialect/Vector) so that the new types can be used
without requiring any new dependency (e.g. on the Vector dialect).
`DecomposeCallGraphTypes.cpp` was a workaround around missing 1:N
support in the dialect conversion. Now that 1:N support was added, the
workaround can be deleted. The test remains in place, as an example for
how to write such a transformation with the dialect conversion
framework.
Note for LLVM integration: If you are using
`DecomposeCallGraphTypes.cpp`, switch to the patterns that are used in
`TestDecomposeCallGraphTypes.cpp`.
FuncOps can have `arg_attrs`, an array of dictionary attributes
associated with their arguments.
E.g.,
```mlir
func.func @main(%arg0: tensor<8xf32> {test.attr_name = "value"}, %arg1: tensor<8x16xf32>)
```
These are exposed via the MLIR Python bindings with
`my_funcop.arg_attrs`.
In this case, it would return `[{test.attr_name = "value"}, {}]`, i.e.,
`%arg1` has an empty `DictAttr`.
However, if I try and access this property from a FuncOp with an empty
`arg_attrs`, e.g.,
```mlir
func.func @main(%arg0: tensor<8xf32>, %arg1: tensor<8x16xf32>)
```
This raises the error:
```python
return ArrayAttr(self.attributes[ARGUMENT_ATTRIBUTE_NAME])
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'attempt to access a non-existent attribute'
```
This PR fixes this by returning the expected `[{}, {}]`.
If expand(collapse) has a dimension that gets collapsed and then
expanded to the same shape, the pattern would fail to canonicalize this
to a single collapse shape. Line 341 was changed because the
expand(collapse) could be a reinterpret-cast like sequence where the
shapes differ but the rank is the same. This cannot be represented by a
single `collapse_shape` op.
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Adds a new test for invalid `tensor.unpack` operations where the output
rank does not match the expected rank (input rank + num inner tile
sizes). For example:
```mlir
tensor.unpack %output
inner_dims_pos = [0, 1]
inner_tiles = [32, 16]
into %input : tensor<64x32x16xf32> -> tensor<256x128xf32>
```
In addition, updates the corresponding error message to make it more
informative:
BEFORE:
```mlir
error: packed rank must equal unpacked rank + tiling factors}
```
AFTER:
```mlir
error: packed rank != (unpacked rank + num tiling factors), got 3 != 4
```