This commit updates the LLVM dialect CallOp and InvokeOp to always print
the variadic callee type (previously callee type) if present. An
additional verifier checks that only variadic calls have a non-null
variadic callee type, and the builders are adapted accordingly to set
the variadic callee type for variadic calls only. Finally, the CallOp
and InvokeOp verifiers are strengthened to check that the variadic
callee type matches the call argument and result types.
The motivation of this change is that CallOp and InvokeOp don't have
hidden state that is not pretty printed, but used during the export to
LLVM IR. Previously, it could happen that a call looked correct in MLIR,
but the return type changed after exporting to LLVM IR (since it has
been taken from the hidden callee type attribute). After landing this
change, this is not possible anymore since the variadic callee type is
always printed if present.
This PR adds `f8E4M3` type to mlir.
`f8E4M3` type follows IEEE 754 convention
```c
f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa),
including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs
Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)
```
Related PRs:
- [PR-97179](https://github.com/llvm/llvm-project/pull/97179) [APFloat]
Add support for f8E4M3 IEEE 754 type
This PR addresses
[!17976](https://github.com/iree-org/iree/issues/17976) by using
converted `resultType` instead of the original result type obtained from
`castOp.getResultVectorType`. A new LIT test is also included.
Add example as unit test for creating a wrapper type/view for
RankedTensorType with encoding. This view provides a more restricted &
typed API while it allows one to avoid repeated casting queries and
accessing the encoding directly.
For users with more advance encodings, the expectation would be a
separate attribute type, but here just StringAttr is used.
This patch handles dependencies specified by the `depend` clause on an
OpenMP target construct. It does this much the same way clang does it by
materializing an OpenMP `task` that is tagged with the dependencies.
The following functions are relevant to this patch -
1) `createTarget` - This function itself is largely unchanged except
that it now accepts a vector of `DependData` objects that it simply
forwards to `emitTargetCall`
2) `emitTargetCall` - This function has changed now to check if an outer
target-task needs to be materialized (i.e if `target` construct has
`nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to
do all the heavy lifting for creating and dispatching the task.
3) `emitTargetTask` - Bulk of the change is here. See the large comment
explaining what it does at the beginning of this function
The main goal of this and subsequent PRs is to unify and categorize
tests in:
* vector-transfer-flatten.mlir
This should make it easier to identify the edge cases being tested (and
how they differ), remove duplicates and to add tests for scalable
vectors.
The main contributions of this PR:
1. For consistency with other tests,
`@transfer_read_flattenable_with_dynamic_dims_and_indices` is renamed
as `@transfer_read_leading_dynamic_dims`. It is also moved near other
tests for `xfer_read`, variable names are updated to match other
`xfer_read` tests
2. `@transfer_write_dims_mismatch_non_zero_indices_trailing_dynamic_dim`
is renamed as `@negative_transfer_read_dynamic_dim_to_flatten` to
better highlight that it's a negative test and to contrast it with
`@transfer_read_leading_dynamic_dims` (and to emphasise the
difference between the two).
3. Similar changes for tests for `xfer_write`.
4. Make sure that we consistently use `%idx_N` (as opposed to `%idxN`).
Follow-up for #95743 and #95744
We need to distinguish ShapedTypes with and without value semantics.
This is needed for downstream users to define their custom vector and
tensor
types that can work with the arith/math dialect.
RFC https://discourse.llvm.org/t/rfc-mlir-types-with-encoding/80189
Recently there was a change to materializing unrealized conversion
casts, which inserted conversion that previously did not exist during
legalization (https://github.com/llvm/llvm-project/pull/97903), after
these cases are inserted and then washed away after transformation
completes, it caused the use-list ordering of an op to change in some
cases: `my.add %arg0(use1), %arg0(use2) --> my.add %arg0(use2),
%arg0(use1)`, which subtly changes the bytecode emitted since this is
considered a custom use-list.
When investigating why the bytecode had changed I added the following
logging which helped track down the difference, in my case it showed
extra bytes with "use-list section". With
`-debug-only=mlir-bytecode-writer` emits logs like the following,
detailing the source of written bytes:
```
emitBytes(4b) bytecode header
emitVarInt(6) bytecode version
emitByte(13) bytecode version
emitBytes(17b) bytecode producer
emitByte(0) null terminator
emitVarInt(2) dialects count
...
emitByte(5) dialect version
emitVarInt(4) op names count
emitByte(9) op names count
emitVarInt(0) dialect number
...
emitVarInt(2) dialect writer
emitByte(5) dialect writer
emitVarInt(9259963783827161088) dialect APInt
...
emitVarInt(3) attr/type offset
emitByte(7) attr/type offset
emitByte(3) section code
emitVarInt(18) section size
...
```
Note: this uses string constants and `StringLiteral`, I'm not sure if
these are washed away during compilation / OK to have these around for
debuggin, or if there's a better way to do this? Alternative was adding
many braces and `LLVM_DEBUG` calls at each callsite, but this felt more
error prone / likely to miss some callsites.
This commit simplifies the handling of dropped arguments and updates
some dialect conversion documentation that is outdated.
When converting a block signature, a `BlockTypeConversionRewrite` object
and potentially multiple `ReplaceBlockArgRewrite` are created. During
the "commit" phase, uses of the old block arguments are replaced with
the new block arguments, but the old implementation was written in an
inconsistent way: some block arguments were replaced in
`BlockTypeConversionRewrite::commit` and some were replaced in
`ReplaceBlockArgRewrite::commit`. The new
`BlockTypeConversionRewrite::commit` implementation is much simpler and
no longer modifies any IR; that is done only in `ReplaceBlockArgRewrite`
now. The `ConvertedArgInfo` data structure is no longer needed.
To that end, materializations of dropped arguments are now built in
`applySignatureConversion` instead of `materializeLiveConversions`; the
latter function no longer has to deal with dropped arguments.
Other minor improvements:
- Add more comments to `applySignatureConversion`.
Note: Error messages around failed materializations for dropped basic
block arguments changed slightly. That is because those materializations
are now built in `legalizeUnresolvedMaterialization` instead of
`legalizeConvertedArgumentTypes`.
This commit is in preparation of decoupling argument/source/target
materializations from the dialect conversion.
This is a re-upload of #96207.
This PR addresses a [comment] made by @ftynse about the syntax for
`ForeachOp`. The syntax was modified by @muneebkhan85 in #82792, where
the attribute dictionary was moved to the middle.
This patch moves it back to its original place at the end. And
introduces an optional keyword for `zip_shortest`.
[comment]:
https://github.com/llvm/llvm-project/pull/82792#pullrequestreview-2132814144
The behavior of TOSA Cast operation for floating-point to integers is to round to the nearest even. This commit aligns the behavior of folding a TOSA Cast of a float splat to int, so it also uses roundeven.
Updates `vectorizeScalableVectorPrecondition` so that scalable
vectorisation is only applied in well understood and tested scenarios.
It's unlikely that we would ever want an arbitrary dimension to be
scalable. While the Linalg vectoriser should be flexible enough to
handle all possibilities:
* in more "exotic" cases, we are likely to struggle with lowerings
further down the compilation stack,
* it would be impractical given the limitations of LLVM (which usually
reflect the limitations of actual hardware) - e.g. no support for
"scalable" arrays of scalable or fixed width vectors (*).
Ultimately, the goal of this patch is to better document what's
currently supported. While this PR adds some new restrictions, no
existing tests are affected.
(*) At MLIR vector level that would correspond to e.g.
`vector<[4]x8xf32>`.
Holes in a live range are points where the corresponding value does not
need to be in a tile/register. If the tile allocator keeps track of
these holes it can reuse tiles for more values (avoiding spills).
Take this simple example:
```mlir
func.func @example(%cond: i1) {
%tileA = arm_sme.get_tile : vector<[4]x[4]xf32>
cf.cond_br %cond, ^bb2, ^bb1
^bb1:
// If we end up here we never use %tileA again!
"test.some_use"(%tileB) : (vector<[4]x[4]xf32>) -> ()
cf.br ^bb3
^bb2:
"test.some_use"(%tileA) : (vector<[4]x[4]xf32>) -> ()
cf.br ^bb3
^bb3:
return
}
```
If you were to calculate the liveness of %tileA and %tileB. You'd see
there is a hole in the liveness of %tileA in bb1:
```
%tileA %tileB
^bb0: Live
^bb1: Live
^bb2: Live
```
The tile allocator can make use of that hole and reuse the tile ID it
assigned to %tileA for %tileB.
Move the utility to check for the validity of tiling affine loop nests
to affine loop utils and expose for users outside the loop tiling pass
or downstream users.
`StringSectionReader`, `ResourceSectionReader` and
`PropertiesSectionReader` are immutable after `initialize` so this PR
adds const to their parsing functions and references in `AttrTypeReader`
and `DialectReader`.
Remove anti-patterns given the default null init for Value. Drop some
extra includes while on this file. NFC.
Co-authored-by: GitHub runner <github-runner@polymagelabs.com>
Convert Linalg winograd_filter_transform, winograd_input_transform, and
winograd_output_transform into nested loops with matrix multiplication
with constant transform matrices.
Support several configurations of Winograd Conv2D, including F(2, 3),
F(4, 3) and F(2, 5). These configurations show that the implementation
can support different kernel size (3 and 5) and different output size
(2 and 4). Besides symetric kernel size 3x3 and 5x5, this patch also
supports 1x3, 3x1, 1x5, and 5x1 kernels.
The implementation is based on the paper, Fast Algorithm for
Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308)
Reviewers: ftynse, Max191, GeorgeARM, nicolasvasilache, MaheshRavishankar, dcaballe, rengolin
Reviewed By: ftynse, Max191
Pull Request: https://github.com/llvm/llvm-project/pull/96183
### Description
This PR implements a minimal version of function signature conversion to
unroll vectors into 1D and with a size supported by SPIR-V (2, 3 or 4
depending on the original dimension). This PR also includes new unit
tests that only check for function signature conversion.
### Future Plans
- Check for capabilities that support vectors of size 8 or 16.
- Set up `OneToNTypeConversion` and `DialectConversion` to replace the
current implementation that uses `GreedyPatternRewriteDriver`.
- Introduce other vector unrolling patterns to cancel out the
`vector.insert_strided_slice` and `vector.extract_strided_slice` ops and
fully legalize the vector types in the function body.
- Handle `func::CallOp` and declarations.
- Restructure the code in `SPIRVConversion.cpp`.
- Create test passes for testing sets of patterns in isolation.
- Optimize the way original shape is splitted into target shapes, e.g.
`vector<5xi32>` can be splitted into `vector<4xi32>` and
`vector<1xi32>`.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
A concatenation of empty tensors can be replaced by a single empty
tensor of the concatenated shape. Add this pattern to
`populateFoldTensorEmptyPatterns`.
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.
What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted. This needs to happen to all
the arguments we pass to the different successors of the parent block
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.
**Note**: this a rework of #96871 . I ran all the integration tests
(`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`) and they passed.
This patch refactors the conversion of math operations to ROCDL library
calls. This pass will also be used in flang to lower Fortran
intrinsics/math functions for OpenMP target offloading codgen.
Generalizes DropUnitDimFromElementwiseOps to support inner unit
dimensions.
This change stems from improving lowering of contractionOps for Arm SME.
Where we end up with inner unit dimensions on MulOp, BroadcastOp and
TransposeOp, preventing the generation of outerproducts.
discussed
[here](https://discourse.llvm.org/t/on-improving-arm-sme-lowering-resilience-in-mlir/78543/17?u=nujaa).
Fix after : https://github.com/llvm/llvm-project/pull/97652 showed an
unhandled edge case when all dimensions are one. The generated target
VectorType would be `vector<f32>` which is apparently not supported by
the mulf.
In case all dimensions are dropped, the target vectorType is
vector<1xf32>
---------
Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
/llvm-project/mlir/include/mlir/CAPI/Rewrite.h:21:63:
error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
DEFINE_C_API_PTR_METHODS(MlirRewriterBase, mlir::RewriterBase);
^
1 error generated.
This exposes most of the `RewriterBase` methods to the C API.
This allows to manipulate both the `IRRewriter` and the
`PatternRewriter`. The
`IRRewriter` can be created from the C API, while the `PatternRewriter`
cannot.
The missing operations are the ones taking `Block::iterator` and
`Region::iterator` as
parameters, as they are not exposed by the C API yet AFAIK.
The Python bindings for these methods and classes are not implemented.
At the moment, the in_bounds attribute has two confusing/contradicting
properties:
1. It is both optional _and_ has an effective default-value.
2. The default value is "out-of-bounds" for non-broadcast dims, and
"in-bounds" for broadcast dims.
(see the `isDimInBounds` vector interface method for an example of this
"default" behaviour [1]).
This PR aims to clarify the logic surrounding the `in_bounds` attribute
by:
* making the attribute mandatory (i.e. it is always present),
* always setting the default value to "out of bounds" (that's
consistent with the current behaviour for the most common cases).
#### Broadcast dimensions in tests
As per [2], the broadcast dimensions requires the corresponding
`in_bounds` attribute to be `true`:
```
vector.transfer_read op requires broadcast dimensions to be in-bounds
```
The changes in this PR mean that we can no longer rely on the
default value in cases like the following (dim 0 is a broadcast dim):
```mlir
%read = vector.transfer_read %A[%base1, %base2], %f, %mask
{permutation_map = affine_map<(d0, d1) -> (0, d1)>} :
memref<?x?xf32>, vector<4x9xf32>
```
Instead, the broadcast dimension has to explicitly be marked as "in
bounds:
```mlir
%read = vector.transfer_read %A[%base1, %base2], %f, %mask
{in_bounds = [true, false], permutation_map = affine_map<(d0, d1) -> (0, d1)>} :
memref<?x?xf32>, vector<4x9xf32>
```
All tests with broadcast dims are updated accordingly.
#### Changes in "SuperVectorize.cpp" and "Vectorization.cpp"
The following patterns in "Vectorization.cpp" are updated to explicitly
set the `in_bounds` attribute to `false`:
* `LinalgCopyVTRForwardingPattern` and `LinalgCopyVTWForwardingPattern`
Also, `vectorizeAffineLoad` (from "SuperVectorize.cpp") and
`vectorizeAsLinalgGeneric` (from "Vectorization.cpp") are updated to
make sure that xfer Ops created by these hooks set the dimension
corresponding to broadcast dims as "in bounds". Otherwise, the Op
verifier would complain
Note that there is no mechanism to verify whether the corresponding
memory access are indeed in bounds. Still, this is consistent with the
current behaviour where the broadcast dim would be implicitly assumed
to be "in bounds".
[1]
4145ad2bac/mlir/include/mlir/Interfaces/VectorInterfaces.td (L243-L246)
[2]
https://mlir.llvm.org/docs/Dialects/Vector/#vectortransfer_read-vectortransferreadop
The main goal of this PR (and subsequent PRs), is to add more tests with
scalable vectors to:
* vector-transfer-collapse-inner-most-dims.mlir
There's quite a few cases to consider, hence this is split into multiple
PRs.
In this PR, I am making the following changes:
* All input memrefs for `xfer_read` are are renamed as `%src`.
* All input memrefs for `xfer_write` are are renamed as `%dest`.
* All variables representing pad values for `xfer_read` are renamed as
`%pad`.
* All vector variables (for `xfer_read` and `xfer_write`) are renamed as
`%v`.
* Add `@contiguous_inner_most_non_zero_idx_in_bounds_scalable` for
`xfer_read` (similar test already exists for `xfer_write`)
* All index variables are renamed as `%i` (1st index) and `%ii` (2nd
index).
The above were marked as TODOs in the test file - these are not
resolved. In addition (to avoid sending another PR):
* `@drop_inner_most_dim` is deleted - it duplicates
`@contiguous_inner_most` for xfer_write
* For consistency with other negative tests, renamed `@non_unit_strides`
as `@negative_non_unit_strides` and added a similar test for
`xfer_read`
* `@non_unit_strides` is renamed as `@negative_non_unit_strides` and
a similar test is added for `xfer_read`.
This is a follow-up for: #94490, #94604, #94906, #96214, #96227
This patch constructs the AMDGCN ISA control variable explicitly instead
of linking against the library shipped with ROCm. This change prevents
issues arising from the order in which the AMDGCN libraries are linked.
This reverts commit ac4b6b662630cd4d3bf6929f2b39ea203c0054a1.
A test change was missing for
mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir in the initial commit.