This PR implements `verify-diagnostics=only-expected` which is a "best
effort" verification - i.e., `unexpected`s and `near-misses` will not be
considered failures. The purpose is to enable narrowly scoped checking
of verification remarks (just as we have for lit where only a subset of
lines get `CHECK`ed).
`failableParallelForEach` will non-deterministically early terminate
upon failure, leading to inconsistent and potentially missing
diagnostics.
This PR uses `parallelForEach` to ensure all operations are verified and
all diagnostics are handled, while tracking the failure state
separately.
Other potential fixes include:
- Making `failableParallelForEach` have deterministic early-exit
behavior (or have an option for it)
- I didn't want to change more than what was required (and potentially
incur perf hits for unrelated code), but if this is a better fix I'm
happy to submit a patch.
- I think all diagnostics that can be detected from verification
failures should be reported, so I don't even think this would be correct
behavior anyway
- Adding an option for `failableParallelForEach` to still execute on
every element on the range while still returning `LogicalResult`
Diagnostics at unknown locations can now be verified with
`-verify-diagnostics`.
Example:
```
// expected-error@unknown {{something went wrong}}
```
Also clean up some MemRefToLLVM conversion tests that had to redirect
all errors to stdout in order to FileCheck them. All of those tests can
now be stored in a single `invalid.mlir`. That was not possible before.
I observed that we have the boundary comments in the codebase like:
```
//===----------------------------------------------------------------------===//
// ...
//===----------------------------------------------------------------------===//
```
I also observed that there are incomplete boundary comments. The
revision is generated by a script that completes the boundary comments.
```
//===----------------------------------------------------------------------===//
// ...
...
```
Signed-off-by: hanhanW <hanhan0912@gmail.com>
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
Currently, `DistinctAttr` uses an allocator wrapped in a
`ThreadLocalCache` to manage attribute storage allocations. This ensures
all allocations are freed when the allocator is destroyed.
However, this setup can cause use-after-free errors when
`mlir::PassManager` runs its passes on a separate thread as a result of
crash reproduction being enabled. Distinct attribute storages are
created in the child thread's local storage and freed once the thread
joins. Attempting to access these attributes after this can result in
segmentation faults, such as during printing or alias analysis.
Example: This invocation of `mlir-opt` demonstrates the segfault issue
due to distinct attributes being created in a child thread and their
storage being freed once the thread joins:
```
mlir-opt --mlir-pass-pipeline-crash-reproducer=. --test-distinct-attrs mlir/test/IR/test-builtin-distinct-attrs.mlir
```
This pull request changes the distinct attribute allocator to use
different allocators depending on whether or not threading is enabled
and whether or not the pass manager is running its passes in a separate
thread. If multithreading is disabled, a non thread-local allocator is
used. If threading remains enabled and the pass manager invokes its pass
pipelines in a child thread, then a non-thread local but synchronised
allocator is used. This ensures that the lifetime of allocated storage
persists beyond the lifetime of the child thread.
I have added two tests for the `-test-distinct-attrs` pass and the
`-enable-debug-info-on-llvm-scope` passes that run them with crash
reproduction enabled.
When generating aliases from `OpAsm{Dialect,Type,Attr}Interface`, the
result would be sanitized and if the alias provided by the interface has
a trailing digit, AsmPrinter would attach an underscore to it to
presumably prevent confliction.
#### Motivation
There are two reasons to motivate the change from the old behavior to
the proposed behavior
1. If the type/attribute can generate unique alias from its content,
then the extra trailing underscore added by AsmPrinter will be strange
```mlir
func.func @add(%ct: !ct_L0_) -> !ct_L0_
%ct_0 = bgv.add %ct, %ct : (!ct_L0_, !ct_L0_) -> !ct_L0_
%ct_1 = bgv.add %ct_0, %ct_0 : (!ct_L0_, !ct_L0_) -> !ct_L0_
%ct_2 = bgv.add %ct_1, %ct_1 : (!ct_L0_, !ct_L0_) -> !ct_L0_
return %ct_2 : !ct_L0_
}
```
Which aesthetically would be better if we have `(!ct_L0, !ct_L0) ->
!ct_L0`
2. The Value name behavior is that, for the first instance, use no
suffix `_N`, which can be similarly applied to alias name. See the IR
above where the first one is called `%ct` and others are called `%ct_N`.
See `uniqueValueName` for detail.
#### Conflict detection
```mlir
!test.type<a = 3> // suggest !name0
!test.type<a = 4> // suggest !name0
!test.another<b = 3> // suggest !name0_
!test.another<b = 4> // suggest !name0_
```
The conflict detection is based on `nameCounts` in `initializeAliases`,
where
In the original way, the first two will get sanitized to `!name0_` and
`initializeAlias` can assign unique id `0, 1, 2, 3` to them.
In the current way, the `initializeAlias` uses `usedAliases` to track
which name has been used, and use such information to generate a suffix
id that will make the printed alias name unique.
The result for the above example is `!name0, !name0_1, !name0_,
!name0_2` now.
The vast majority of rewrite / conversion patterns uses a combined
`matchAndRewrite` instead of separate `match` and `rewrite` functions.
This PR optimizes the code base for the most common case where users
implement a combined `matchAndRewrite`. There are no longer any `match`
and `rewrite` functions in `RewritePattern`, `ConversionPattern` and
their derived classes. Instead, there is a `SplitMatchAndRewriteImpl`
class that implements `matchAndRewrite` in terms of `match` and
`rewrite`.
Details:
* The `RewritePattern` and `ConversionPattern` classes are simpler
(fewer functions). Especially the `ConversionPattern` class, which now
has 5 fewer functions. (There were various `rewrite` overloads to
account for 1:1 / 1:N patterns.)
* There is a new class `SplitMatchAndRewriteImpl` that derives from
`RewritePattern` / `OpRewritePatern` / ..., along with a type alias
`RewritePattern::SplitMatchAndRewrite` for convenience.
* Fewer `llvm_unreachable` are needed throughout the code base. Instead,
we can use pure virtual functions. (In cases where users previously had
to implement `rewrite` or `matchAndRewrite`, etc.)
* This PR may also improve the number of [`-Woverload-virtual`
warnings](https://discourse.llvm.org/t/matchandrewrite-hiding-virtual-functions/84933)
that are produced by GCC. (To be confirmed...)
Note for LLVM integration: Patterns with separate `match` / `rewrite`
implementations, must derive from `X::SplitMatchAndRewrite` instead of
`X`.
---------
Co-authored-by: River Riddle <riddleriver@gmail.com>
After the introduction of `OpAsmAttrInterface` for alias in #124721, the
natural thought to exercise it would be migrating the MLIR existing
alias generation method, i.e. `OpAsmDialectInterface`, to use the new
interface.
There is a `BuiltinOpAsmDialectInterface` that generates aliases for
`AffineMapAttr` and `IntegerSetAttr`, and these attributes could be
migrated to use `OpAsmAttrInterface`.
However, the tricky part is that `OpAsmAttrInterface` lives in
`OpImplementation.h`. If `BuiltinAttributes.h` includes that, it would
become a cyclic inclusion.
Note that only BuiltinAttribute/Type would face such issue as outside
user can just include `OpImplementation.h` (see downstream example
https://github.com/google/heir/pull/1437)
The dependency is introduced by the fact that `OpAsmAttrInterface` uses
`OpAsmDialectInterface::AliasResult`.
The solution to is: Put the `AliasResult` in `OpAsmSupport.h` that all
interfaces can include that header safely. The API wont break as
`mlir::OpAsmDialectInterface::AliasResult` is a typedef of this class.
(Continuing from #106160)
This PR addresses remaining review comments from the original PR.
Original PR Description
---
Upcoming hardware (gfx12 and some future gfx9) will support the OCP
8-bit float formats for their matrix multiplication intrinsics and
conversion operations, retaining existing opcodes and compiler builtins.
This commit adds support for these types to the MLIR wrappers around
such operations, ensuring that the OCP types aren't used to generate
those builtins on hardware that doesn't expect that format and,
conversely, to ensure that the pre-OCP formats aren't used on new
hardware.
---------
Signed-off-by: Mirza Halilcevic <mirza.halilcevic@amd.com>
Co-authored-by: Paul Fuqua <pf@acm.org>
Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
Whenever `symbolicDivide` returns nullptr when called from inside
`simplifySemiAffine` we substitute the result with the original
expression (`expr`). nullptr simply indicates that the floordiv
expression cannot be simplified further.
Fixes: https://github.com/llvm/llvm-project/issues/122231
See
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792
for detailed introduction.
This is a follow up PR of #121187, by integrating OpAsmTypeInterface
with AsmPrinter. There are a few conditions when OpAsmTypeInterface
comes into play
* There is no OpAsmOpInterface
* Or OpAsmOpInterface::getAsmResultName/getBlockArgumentName does not
invoke `setName` (i.e. the default impl)
* All results have OpAsmTypeInterface (otherwise we can not handle
result grouping behavior)
Cc @River707 @jpienaar @ftynse for review.
resource keys have the problem that you can’t parse them from mlir
assembly if they have special or non-printable characters, but nothing
prevents you from specifying such a key when you create e.g. a
DenseResourceElementsAttr, and it works fine in other ways, including
bytecode emission and parsing
this PR solves the parsing by quoting and escaping keys with special or
non-printable characters in mlir assembly, in the same way as symbols,
e.g.:
```
module attributes {
fst = dense_resource<resource_fst> : tensor<2xf16>,
snd = dense_resource<"resource\09snd"> : tensor<2xf16>
} {}
{-#
dialect_resources: {
builtin: {
resource_fst: "0x0200000001000200",
"resource\09snd": "0x0200000008000900"
}
}
#-}
```
by not quoting keys without special or non-printable characters, the
change is effectively backwards compatible
the change is tested by:
1. adding a test with a dense resource handle key with special
characters to `dense-resource-elements-attr.mlir`
2. adding special and unprintable characters to some resource keys in
the existing lit tests `pretty-resources-print.mlir` and
`mlir/test/Bytecode/resources.mlir`
See
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792
for detailed introduction.
This PR acts as the first part of it
* Add `OpAsmTypeInterface` and `getAsmName` API for deducing ASM name
from type
* Add default impl in `OpAsmOpInterface` to respect this API when
available.
The `OpAsmAttrInterface` / hooking into Alias system part should be
another PR, using a `getAlias` API.
### Discussion
* Instead of using `StringRef getAsmName()` as the API, I use `void
getAsmName(OpAsmSetNameFn)`, as returning StringRef might be unsafe
(std::string constructed inside then returned a _ref_; and this aligns
with the design of `getAsmResultNames`.
* On the result packing of an op, the current approach is that when not
all of the result types are `OpAsmTypeInterface`, then do nothing (old
default impl)
### Review
Cc @j2kun and @Alexanderviand-intel for downstream; Cc @River707 and
@joker-eph for relevent commit history; Cc @ftynse for discourse.
This was discussed during the original review but I made it stricter
than discussed. Making it a pure view but adding a helper for bytecode
serialization (I could avoid the helper, but it ends up with more logic
and stronger coupling).
Remove builder API (e.g., `b.getFloat4E2M1FNType()`) and caching in
`MLIRContext` for low-precision FP types. Types are still cached in the
type uniquer.
For details, see:
https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361/28
Note for LLVM integration: Use `b.getType<Float4E2M1FNType>()` or
`Float4E2M1FNType::get(b.getContext())` instead of
`b.getFloat4E2M1FNType()`.
A very common mistake users (and yours truly) make when using
`ValueRange`s is assigning a temporary `Value` to it. Example:
```cpp
ValueRange values = op.getOperand();
apiThatUsesValueRange(values);
```
The issue is caused by the implicit `const Value&` constructor: As per
C++ rules a const reference can be constructed from a temporary and the
address of it taken. After the statement, the temporary goes out of
scope and `stack-use-after-free` error occurs.
This PR fixes that issue by making `ValueRange` capable of owning a
single `Value` instance for that case specifically. While technically a
departure from the other owner types that are non-owning, I'd argue that
this behavior is more intuitive for the majority of users that usually
don't need to care about the lifetime of `Value` instances.
`TypeRange` has similarly been adopted to accept a single `Type`
instance to implement `getTypes`.
This makes it possible to add new MLIR floating point types in
downstream projects. (Adding new APFloat semantics in downstream
projects is not possible yet, so parsing/printing/converting float
literals of newly added types is not supported.)
Also removes two functions where we had to hard-code all existing
floating point types (`FloatType::classof`). See discussion here:
https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361
No measurable compilation time changes for these lit tests:
```
Benchmark 1: mlir-opt ./mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir -split-input-file -convert-vector-to-llvm -o /dev/null
BEFORE
Time (mean ± σ): 248.4 ms ± 3.2 ms [User: 237.0 ms, System: 20.1 ms]
Range (min … max): 243.3 ms … 255.9 ms 30 runs
AFTER
Time (mean ± σ): 246.8 ms ± 3.2 ms [User: 233.2 ms, System: 21.8 ms]
Range (min … max): 240.2 ms … 252.1 ms 30 runs
Benchmark 2: mlir-opt- ./mlir/test/Dialect/Arith/canonicalize.mlir -split-input-file -canonicalize -o /dev/null
BEFORE
Time (mean ± σ): 37.3 ms ± 1.8 ms [User: 31.6 ms, System: 30.4 ms]
Range (min … max): 34.6 ms … 42.0 ms 200 runs
AFTER
Time (mean ± σ): 37.5 ms ± 2.0 ms [User: 31.5 ms, System: 29.2 ms]
Range (min … max): 34.5 ms … 43.0 ms 200 runs
Benchmark 3: mlir-opt ./mlir/test/Dialect/Tensor/canonicalize.mlir -split-input-file -canonicalize -allow-unregistered-dialect -o /dev/null
BEFORE
Time (mean ± σ): 152.2 ms ± 2.5 ms [User: 140.1 ms, System: 12.2 ms]
Range (min … max): 147.6 ms … 161.8 ms 200 runs
AFTER
Time (mean ± σ): 151.9 ms ± 2.7 ms [User: 140.5 ms, System: 11.5 ms]
Range (min … max): 147.2 ms … 159.1 ms 200 runs
```
A micro benchmark that parses + prints 32768 floats with random
floating-point type shows a slowdown from 55.1 ms -> 48.3 ms.
The current implementation of LocationSnapshotPass takes an
OpPrintingFlags argument and stores it as member, but does not use it
for printing.
Properly implement the printing flags, also supporting command line args.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
The `properlyDominates` implementations for blocks and ops are very
similar. This commit replaces them with a single implementation that
operates on block iterators. That implementation can be used to
implement both `properlyDominates` variants.
Before:
```c++
template <bool IsPostDom>
bool DominanceInfoBase<IsPostDom>::properlyDominatesImpl(Block *a,
Block *b) const;
template <bool IsPostDom>
bool DominanceInfoBase<IsPostDom>::properlyDominatesImpl(
Operation *a, Operation *b, bool enclosingOpOk) const;
```
After:
```c++
template <bool IsPostDom>
bool DominanceInfoBase<IsPostDom>::properlyDominatesImpl(
Block *aBlock, Block::iterator aIt, Block *bBlock, Block::iterator bIt,
bool enclosingOk) const;
```
Note: A subsequent commit will add a new public `properlyDominates`
overload that accepts block iterators. That functionality can then be
used to find a valid insertion point at which a range of values is
defined (by utilizing post dominance).
Fixes an issue where the `SimpleAffineExprFlattener` would simplify
`lhs % rhs` to just `-(lhs floordiv rhs)` instead of
`lhs - (lhs floordiv rhs)`
if `lhs` happened to be equal to `lhs floordiv rhs`.
The reported failure case was
`(d0, d1) -> (((d1 - (d1 + 2)) floordiv 8) % 8)`
from https://github.com/llvm/llvm-project/issues/114654.
Note that many paths that simplify AffineMaps (e.g. the AffineApplyOp
folder and canonicalization) would not observe this bug because of
of slightly different paths taken by the code. Slightly different
grouping of the terms could also result in avoiding the bug.
Resolves https://github.com/llvm/llvm-project/issues/114654.
This PR adds an `AsmPrinter` option `-mlir-use-nameloc-as-prefix` which
uses trailing `NameLoc`s, if the source IR provides them, as prefixes
when printing SSA IDs.
This change allows to expose through an interface attributes wrapping
content as external resources, and the usage inside the ModuleToObject
show how we will be able to provide runtime libraries without relying on
the filesystem.
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
`isOpOperandCanBeDroppedAfterFusedLinalgs` crashes when `indexingMaps`
is empty. This can occur when `producer` only has DPS init operands and
`consumer ` only has a single DPS input operand (all operands are
ignored and nothing gets added to `indexingMaps`). This is because
`concatAffineMaps` wasn't handling the maps being empty properly.
Similar to `canOpOperandsBeDroppedImpl`, I added an early return when
the maps are of size zero. Additionally, `concatAffineMaps`'s
declaration comment says it returns an empty map when `maps` is empty
but it has no way to get the `MLIRContext` needed to construct the empty
affine map when the array is empty. So, I changed this to take the
context.
__NOTE: concatAffineMaps now takes an MLIRContext to be able to
construct an empty map in the case where `maps` is empty.__
---------
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Co-authored-by: Quinn Dawkins <quinn.dawkins@gmail.com>
This location type represents a contiguous range inside a file. It is
effectively a pair of FileLineCols. Add new type and make FileLineCol a
view for case where it matches existing previous one.
The location includes filename and optional start line & col, and end
line & col. Considered common cases are file:line, file:line:col,
file:line:start_col to file:line:end_col and general range within same
file. In memory its encoded as trailing objects. This keeps the memory
requirement the same as FileLineColLoc today (makes the rather common
File:Line cheaper) at the expense of extra work at decoding time. Kept the unsigned
type.
There was the option to always have file range be castable to
FileLineColLoc. This cast would just drop other fields. That may result
in some simpler staging. TBD.
This is a rather minimal change, it does not yet add bindings (C or
Python), lowering to LLVM debug locations etc. that supports end line:cols.
---------
Co-authored-by: River Riddle <riddleriver@gmail.com>
Disabling memrefs with a stride of 0 was intended to prevent internal
aliasing, but this does not address all cases : internal aliasing can
still occur when the stride is less than the shape.
On the other hand, a stride of 0 can be very useful in certain
scenarios. For example, in architectures that support multi-dimensional
DMA, we can use memref::copy with a stride of 0 to achieve a broadcast
effect.
This commit removes the restriction that strides in memrefs cannot be 0.
TF32 is a variant of F32 that is truncated to 19 bits. There used to be
special handling in `FloatType::getWidth()` so that TF32 was treated as
a 32-bit float in some places. (Some places use `FloatType::getWidth`,
others directly query the `APFloat` semantics.) This caused problems
because `FloatType::getWidth` did not agree with the underlying
`APFloat` semantics.
In particular, creating an elements attr / array attr with `tf32`
element type crashed. E.g.:
```
"foo"() {attr = dense<4.0> : tensor<tf32>} : () -> ()
mlir-opt: llvm-project/llvm/lib/Support/APFloat.cpp:4108: void llvm::detail::IEEEFloat::initFromAPInt(const fltSemantics *, const APInt &): Assertion `api.getBitWidth() == Sem->sizeInBits' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
```
```
"foo"() {f32attr = array<tf32: 1024.>} : () -> ()
mlir-opt: llvm-project/mlir/lib/AsmParser/AttributeParser.cpp:847: void (anonymous namespace)::DenseArrayElementParser::append(const APInt &): Assertion `data.getBitWidth() % 8 == 0' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
```
It is unclear why the special handling for TF32 is needed. For
reference: #107372
Add a new helper function `isReachable` to `Block`. This function
traverses all successors of a block to determine if another block is
reachable from the current block.
This functionality has been reimplemented in multiple places in MLIR.
Possibly additional copies in downstream projects. Therefore, moving it
to a common place.