Debug value/declare operations imported before landing pad operations at
the bb start break invoke op verification:
```
error: first operation in unwind destination should be a llvm.landingpad operation
```
This this issue by making the placement slightly more smart.
Avoid operations that can overflow in constant folders for
`tosa.reduce_max` and `tosa.reduce_min`
Includes tests to avoid regressions
Signed-off-by: Ian Tayler Lessa <ian.taylerlessa@arm.com>
Reverts llvm/llvm-project#131876
GPU integration tests get broken by this PR.
E.x.
`mlir/test/Integration/GPU/CUDA/sm90/gemm_f32_f16_f16_128x128x128.mlir`
Propagate `Extract(Elementwise(...))` -> `Elemetwise(Extract...)`.
Currenly limited to the case when extract is the single use of
elementwise to avoid introducing additional elementwise ops.
* Improve the verifier of `memref.subview` to detect out-of-bounds
extractions.
* Improve the documentation of `memref.subview` to make clear that
out-of-bounds extractions are not allowed. Rewrite examples to use the
new `strided<>` notation instead of `affine_map` layout maps. Also
remove all unrelated operations (`memref.alloc`) from the examples.
* Fix various test cases where `memref.subview` ops ran out-of-bounds.
* Update canonicalizations patterns to ensure that they do not fold IR
if it would generate IR that no longer verifies.
Related discussion on Discourse:
https://discourse.llvm.org/t/out-of-bounds-semantics-of-memref-subview/85293
This commit checks if the operands/results of an operator can be found
in the profile compliance mapping, if it isn't the operator is
considered invalid. As a result, operator datatype combinations that are
not listed under the "Supported Data Types" of the TOSA specification
are disallowed and the validation pass results in failure.
Signed-off-by: Luke Hutton <luke.hutton@arm.com>
Handle dense resource attributes in the transpose TOSA folder.
Currently their interface does not align with the rest of the
`ElementsAttr` when it comes to data accessing hence the special
handling.
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
- Add missing int16 extension for concat operator
- Remove int16 extension for cast operator
- Add pro_int and pro_fp profiles for const_shape operator
Signed-off-by: Jerry Ge <jerry.ge@arm.com>
This gets the consumer fusion method in sync with the corresponding
producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as
input required use of complicated analysis to retrieve the surrounding
loops which are very fragile. Just like the producer fusion method, the
loops need to be taken in as an argument, with typically the loops being
created by the tiling methods.
Some utilities are added to check that the loops passed in are perfectly
nested (in the case of an `scf.for` loop nest.
This is change 1 of N to simplify the implementation of tile and fuse
consumers.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Create a new operation `DSOLocalEquivalentOp`, following the steps of
other constants.
This is similar in a way to `AddressOfOp` but with specific semantics:
only support functions and function aliases (no globals) and extern_weak
linkage is not allowed.
An alternative approach is to use a new `UnitAttr` in `AddressOfOp` and
check that attribute to enforce specific semantics in the verifiers. The
drawback is going against what other constants do and having to add more
attributes in the future when we introduce `no_cfi`, `blockaddress`,
etc.
While here, improve the error message for other missing constants.
A small clean-up following up on #131795. Seems like we had 2 quite
similar implementations for the same thing: emit task dependencies
struct and filling it. This PR unifies the 2 versions into one. This is
better since we had to fix a bug in one of them in #131795 so this
applies the fix for both.
Since #130487, `tensor.extract_slice` and `tensor.insert_slice` ops that
are statically detected to go out of bounds are rejected by the
verifier.
This commit fixes canonicalization patterns that currently fold
dynamically out-of-bounds ops (valid IR) to statically out-of-bounds ops
(invalid IR).
For vector.extract, the folder always canonicalizes to a vector.extract
operation, while the rewrite pattern canonicalizes to a vector.broadcast
except in the case of 0-rank vectors.
Remove this special casing, and instead handle the 0-rank vector case in
the folder.
A number of places in our codebase special case to use
extractelement/insertelement for 0D vectors, because extract/insert did
not support 0D vectors previously. Since insert/extract support 0D
vectors now, use them instead of special casing.
This patch matches the definition of vector.scatter as a counter part of
vector.gather.
All of the changes done in this patch make vector.scatter match
vector.gather 's multi dimensional definition.
Unrolling for vector.scatter will be implemented in subsequent patches.
Discourse Discussion:
https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13
This patch decouples unrolling vector.gather and lowering vector.gather
to llvm.masked.gather.
This is consistent with how vector.load, vector.store,
vector.maskedload, vector.maskedstore lower to LLVM.
Some interesting test changes from this patch:
- 2D vector.gather lowering to llvm tests are deleted. This is
consistent with other memory load/store ops.
- There are still tests for 2D vector.gather, but the constant mask for
these test is modified. This is because with the updated lowering, one
of the unrolled vector.gather disappears because it is masked off (also
demonstrating why this is a better lowering path)
Overall, this makes vector.gather take the same consistent path for
lowering to LLVM as other load/store ops.
Discourse Discussion:
https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13
In the example below it is not clear that `(f32)` relates to `%arg2` and
not to `vector<2xf32>`:
```mlir
%0 = spirv.ImageSampleImplicitLod %arg0, %arg1 ["Lod"](%arg2) :
!spirv.sampled_image<...>, vector<2xf32>(f32) -> vector<4xf32>
```
This change applies new format to image operations and image operands
that does not use parenthesis and is less ambiguous:
```mlir
%0 = spirv.ImageSampleImplicitLod %arg0, %arg1 ["Lod"], %arg2 :
!spirv.sampled_image<...>, vector<2xf32>, f32 -> vector<4xf32>
```
This is an implementation for [RFC: Supporting Sub-Channel Quantization
in
MLIR](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694).
In order to make the review process easier, the PR has been divided into
the following commit labels:
1. **Add implementation for sub-channel type:** Includes the class
design for `UniformQuantizedSubChannelType`, printer/parser and bytecode
read/write support. The existing types (per-tensor and per-axis) are
unaltered.
2. **Add implementation for sub-channel type:** Lowering of
`quant.qcast` and `quant.dcast` operations to Linalg operations.
3. **Adding C/Python Apis:** We first define he C-APIs and build the
Python-APIs on top of those.
4. **Add pass to normalize generic ....:** This pass normalizes
sub-channel quantized types to per-tensor per-axis types, if possible.
A design note:
- **Explicitly storing the `quantized_dimensions`, even when they can be
derived for ranked tensor.**
While it's possible to infer quantized dimensions from the static shape
of the scales (or zero-points) tensor for ranked
data tensors
([ref](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694/3)
for background), there are cases where this can lead to ambiguity and
issues with round-tripping.
```
Consider the example: tensor<2x4x!quant.uniform<i8:f32:{0:2, 0:2}, {{s00:z00, s01:z01}}>>
```
The shape of the scales tensor is [1, 2], which might suggest that only
axis 1 is quantized. While this inference is technically correct, as the
block size for axis 0 is a degenerate case (equal to the dimension
size), it can cause problems with round-tripping. Therefore, even for
ranked tensors, we are explicitly storing the quantized dimensions.
Suggestions welcome!
PS: I understand that the upcoming holidays may impact your schedule, so
please take your time with the review. There's no rush.
Previously, the encodings are unconditionally dropped during the shape
inference. The revision adds the support for preserving the encodings in
the linalg ops.
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Some of the error_if checks were missed in this PR:
https://github.com/llvm/llvm-project/pull/124956
Add back those tests to check suitable sizes for Resize
Signed-off-by: Luke Hutton <luke.hutton@arm.com>
Co-authored-by: Luke Hutton <luke.hutton@arm.com>
- Add packed conversions fp8/bf8->bf16 for gfx950 and fp8/bf8->fp32 for
gfx942 in ROCDL dialect
- Update amdgpu.ext_packed_fp8 lowering to use ROCDL packed fp8/bf8->f32
conversions for vector target types and ROCDL scalar fp8/bf8->fp32 for
scalar target type.
---------
Co-authored-by: Jungwook Park <jungwook.park@amd.com>
When inlining a `callee` with a call site debug location, the inlining
infrastructure was trivially combining the `callee` and the `caller`
locations, forming a "tree" of call stacks. Because of this, the remarks
were printing an incomplete inlining stack.
This commit handles this case and appends the `caller` location at the
end of the `callee`'s stack, extending the chain.
This PR adds the Vector transfer_read to load rewrite pattern. The
pattern creates a transfer read op lowering. A vector trasfer read op
will be lowered to a combination of `vector.load`, `arith.select` and
`vector.broadcast` if:
- The transfer op is masked.
- The memref is in buffer address space.
- Other conditions introduced from `TransferReadToVectorLoadLowering`
The motivation of this PR is due to the lack of support of masked load
from amdgpu backend. `llvm.intr.masked.load` lower to a series of
conditional scalar loads refer to (`scalarize-masked-mem-intrin` pass).
This PR will make it possible for masked transfer_read to be lowered
towards buffer load with bounds check, allowing a more optimized global
load accessing pattern compared with existing implementation of
`llvm.intr.masked.load` on vectors.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
* `PyRegionList` is now sliceable. The dialect bindings generator seems
to assume it is sliceable already (!), yet accessing e.g. `cases` on
`scf.IndexedSwitchOp` raises a `TypeError` at runtime.
* `PyBlockList` and `PyOperationList` support negative indexing. It is
common for containers to do that in Python, and most container in the
MLIR Python bindings already allow the index to be negative.
Fixes a bug introduced by
https://github.com/llvm/llvm-project/pull/130078.
For non-BlockArgOpenMPOpInterface ops, we also want to map their entry
block arguments to their operands, if any. For the current support in
the OpenMP dialect, the table below lists all ops that have arguments
(SSA operands and/or attributes) and not target-related. Of all these
ops, we need to only process `omp.atomic.update` since it is the only op
that has SSA operands & an attached region. Therefore, the region's
entry block arguments must be mapped to the op's operands in case they
are referenced inside the region.
| op | operands? | region(s)? | parent is func? | processed? |
|--------------|-------------|------------|------------------|-------------|
| atomic.read | yes | no | yes | no |
| atomic.write | yes | no | yes | no |
| atomic.update | yes | yes | yes | yes |
| critical | no | no | yes | no |
| declare_mapper | no | yes | no | no |
| declare_reduction | no | yes | no | no |
| flush | yes | no | yes | no |
| private | no | yes | yes | no |
| threadprivate | yes | no | yes | no |
| yield | yes | no | yes | no |
This patch makes the `map_type` and `map_capture_type` arguments of the
`omp.map.info` operation required, which was already an invariant being
verified by its users via `verifyMapClause()`. This makes it clearer, as
getters no longer return misleading `std::optional` values.
Checks for the `mapper_id` argument are moved to a verifier for the
operation, rather than being checked by users.
Functionally NFC, but not marked as such due to a reordering of
arguments in the assembly format of `omp.map.info`.