1975 Commits

Author SHA1 Message Date
Sergey Kozub
3f9cabae00
[MLIR] Add f8E8M0FNU type (#111028)
This PR adds `f8E8M0FNU` type to MLIR.

`f8E8M0FNU` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 8-bit floating point number with bit layout S0E8M0. Unlike
IEEE-754 types, there are no infinity, denormals, zeros or negative
values.

```c
f8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- [PR-107127](https://github.com/llvm/llvm-project/pull/107127)
[APFloat] Add APFloat support for E8M0 type
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
- [PR-107999](https://github.com/llvm/llvm-project/pull/107999) [MLIR]
Add f6E2M3FN type
- [PR-108877](https://github.com/llvm/llvm-project/pull/108877) [MLIR]
Add f4E2M1FN type
2024-10-04 09:23:12 +02:00
Aman LaChapelle
759a7b5933
[mlir] Add the ability to define dialect-specific location attrs. (#105584)
This patch adds the capability to define dialect-specific location
attrs. This is useful in particular for defining location structure that
doesn't necessarily fit within the core MLIR location hierarchy, but
doesn't make sense to push upstream (i.e. a custom use case).

This patch adds an AttributeTrait, `IsLocation`, which is tagged onto
all the builtin location attrs, as well as the test location attribute.
This is necessary because previously LocationAttr::classof only returned
true if the attribute was one of the builtin location attributes, and
well, the point of this patch is to allow dialects to define their own
location attributes.

There was an alternate implementation I considered wherein LocationAttr
becomes an AttrInterface, but that was discarded because there are
likely to be *many* locations in a single program, and I was concerned
that forcing every MLIR user to pay the cost of the additional
lookup/dispatch was unacceptable. It also would have been a *much* more
invasive change. It would have allowed for more flexibility in terms of
pretty printing, but it's unclear how useful/necessary that flexibility
would be given how much customizability there already is for attribute
definitions.
2024-10-03 10:25:44 -07:00
Shoaib Meenai
8773bd0e6e
[mlir] Print aliases for recursive types (#110346)
We're already keeping track of the alias depth to ensure that aliases
are printed before they're referenced. For recursive types, we can
additionally track whether an alias has been printed and only reference
it if so, to lift the restrictions on not printing aliases inside
mutable types.
2024-09-28 18:15:14 -07:00
Andrzej Warzyński
6d11494414
[mlir][Linalg] Refine how broadcast dims are treated (#99015)
This PR fixes how broadcast dims (identified as "zero" results in
permutation maps) corresponding to a reduction iterator are vectorised
in the case of generic Ops. Here's an example:

```mlir
  #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
  #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)>

  func.func @generic_with_reduction_and_broadcast(%arg0: tensor<1x12x197x197xf32>) -> (tensor<1x12x197x1xf32>) {
    %0 = tensor.empty() : tensor<1x12x197x1xf32>

    %1 = linalg.generic {indexing_maps = [#map, #map1],
                        iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
      ins(%arg0 : tensor<1x12x197x197xf32>)
      outs(%0 : tensor<1x12x197x1xf32>) {

    ^bb0(%in: f32, %out: f32):
      %818 = arith.addf %in, %out : f32
      linalg.yield %818 : f32
    } -> tensor<1x12x197x1xf32>
    return %1 : tensor<1x12x197x1xf32>
  }
```

This is a perfectly valid Generic Op, but currently triggers two issues
in the vectoriser. The root cause is this map:

```mlir
  #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)>
```

This map triggers an assert in `reindexIndexingMap` -  this hook
incorrectly assumes that every result in the input map is a `dim`
expression and that there are no constants. That's not the case in this
example. `reindexIndexingMap` is extended to allow maps like the one
above. For now, only constant "zero" results are allowed. This can be
extended in the future once a good motivating example is available.

Separately, the permutation map highlighted above "breaks" mask
calculation (ATM masks are always computed, even in the presence of
static shapes). When applying the following permutation:
```mlir
  (d0, d1, d2, d3) -> (d0, d1, d2, 0)
```

to these canonical shapes (corresponding to the example above):
```
  (1, 12, 197, 197)
```
we end up with the following error:
```bash
error: vector types must have positive constant sizes but got 1, 12, 197, 0
```

The error makes sense and indicates that we should update the
permutation map above to:
```
  (d0, d1, d2, d3) -> (d0, d1, d2)
```

This would correctly give the following vector type:
```
  vector<1x12x197xi1>
```

Fixes #97247
2024-09-26 16:17:15 +01:00
Sergey Kozub
2c58063435
[MLIR] Add f4E2M1FN type (#108877)
This PR adds `f4E2M1FN` type to mlir.

`f4E2M1FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 4-bit floating point number with bit layout S1E2M1. Unlike
IEEE-754 types, there are no infinity or NaN values.

```c
f4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5
```

Related PRs:
- [PR-95392](https://github.com/llvm/llvm-project/pull/95392) [APFloat]
Add APFloat support for FP4 data type
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
- [PR-107999](https://github.com/llvm/llvm-project/pull/107999) [MLIR]
Add f6E2M3FN type
2024-09-24 08:22:48 +02:00
wang-y-z
041b0a81b0
[MLIR][Operation] Fix isBeforeInBlock crash bug mentioned in https://github.com/llvm/llvm-project/issues/60909 (#101172)
# summary
This MR fix `isBeforeInBlock` crash bug mentioned in
https://github.com/llvm/llvm-project/issues/60909. Fixes #60909.
# Trigger condition
1. A block only have one operation.
2. `block->isOpOrderValid()` is true, but `op->hasValidOrder()` is
false.
3. call: `op->isBeforeInBlock(op)`, compared with op itself.

Will crash on `assert(blockFront != blockBack && "expected more than one
operation");`

# Case study
Simplified repro case in
`mlir/test/Pass/scf2cf-print-liveness-crash.mlir`
When put `-convert-scf-to-cf -test-print-liveness` together in one cmd
line, the first pass will work normally and crash on the second pass.
Details please refer  https://github.com/llvm/llvm-project/issues/60909

# Solutions
option1. in `isBeforeInBlock`, check if block only have one operation
before step into `updateOrderIfNecessary`, if have only one, it must
return false
option2. in `isBeforeInBlock`, check if `this == other`, if true return
false
option3. fix `addNodeToList` logic

I prefer option3: 

When a block contains only one operation and the user calls
op->isBeforeInBlock(op), if block->isOpOrderValid() returns true,
updateOrderIfNecessary is called. If op->hasValidOrder() is false, it
will crash at the assertion assert(blockFront != blockBack && "expected
more than one operation");.

This behavior is abnormal and needs fixing. I discovered that after the
first pass of `-convert-scf-to-cf`, there is a block with only one
operation where the block order is valid but the operation order is
invalid, leading to a crash when `-test-print-liveness` pass runs.

---------

Co-authored-by: isaacw <isaacw@nvidia.com>
2024-09-21 23:40:52 +08:00
Billy Zhu
a1d64626ba
[MLIR][IR] Fix InProgressAliasInfo init for non-alias (#109013)
When visiting an attr/type that is NoAlias, the created
`InProgressAliasInfo` was not getting its `canBeDeferred` and `isType`
fields set. Not setting `canBeDeferred` when it should be true breaks
the assumption that all nested elements are also false. This will cause
problems when at a later point the attr/type needs to be converted by
`markAliasNonDeferrable`, as recursion will stop when a
`canBeDeferred=false` attr/type is reached, leaving its nested elements
not flipped. This causes nested elements to be printed later in the
textual IR and cannot be parsed back in.
2024-09-17 14:27:31 -07:00
JOE1994
884221eddb [mlir] Tidy uses of llvm::raw_stream_ostream (NFC)
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly

( 65b13610a5226b84889b923bae884ba395ad084d for further reference )

* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
2024-09-16 23:23:25 -04:00
Sergey Kozub
73d83f20c9
[MLIR] Add f6E2M3FN type (#107999)
This PR adds `f6E2M3FN` type to mlir.

`f6E2M3FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 6-bit floating point number with bit layout S1E2M3. Unlike
IEEE-754 types, there are no infinity or NaN values.

```c
f6E2M3FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.000
- Max normal number: S.11.111 = ±2^(2) x (1 + 0.875) = ±7.5
- Min normal number: S.01.000 = ±2^(0) = ±1.0
- Max subnormal number: S.00.111 = ±2^(0) x 0.875 = ±0.875
- Min subnormal number: S.00.001 = ±2^(0) x 0.125 = ±0.125
```

Related PRs:
- [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat]
Add APFloat support for FP6 data types
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
2024-09-16 21:09:27 +02:00
Sergey Kozub
083e25c1d4
[MLIR] [NFC] Use APFloat semantics to get floating type width (#107372)
As suggested in the comments of
https://github.com/llvm/llvm-project/pull/105573
2024-09-10 10:50:34 +02:00
Sergey Kozub
918222ba43
[MLIR] Add f6E3M2FN type (#105573)
This PR adds `f6E3M2FN` type to mlir.

`f6E3M2FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 6-bit floating point number with bit layout S1E3M2. Unlike
IEEE-754 types, there are no infinity or NaN values.

```c
f6E3M2FN
- Exponent bias: 3
- Maximum stored exponent value: 7 (binary 111)
- Maximum unbiased exponent value: 7 - 3 = 4
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.000.00
- Max normal number: S.111.11 = ±2^(4) x (1 + 0.75) = ±28
- Min normal number: S.001.00 = ±2^(-2) = ±0.25
- Max subnormal number: S.000.11 = ±2^(-2) x 0.75 = ±0.1875
- Min subnormal number: S.000.01 = ±2^(-2) x 0.25 = ±0.0625
```

Related PRs:
- [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat]
Add APFloat support for FP6 data types
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) [MLIR] Add
f8E4M3 type - was used as a template for this PR
2024-09-10 10:41:05 +02:00
Kazu Hirata
01eb071de0
[mlir] Avoid repeated hash lookups (NFC) (#107519) 2024-09-06 07:48:39 -07:00
Johannes Reifferscheid
8af0860529
AffineExpr: Fix result of d0 + (d0 // -c) * c. (#107530)
Currently, this is rewritten to d0 mod -c. However, we do not support
modulo with a negative RHS in our lowering passes, so this triggers
undefined behavior.

It would be better to not have these ad hoc simplifications at all, but
I guess that ship has sailed.
2024-09-06 12:53:33 +02:00
Benjamin Maxwell
84aa02d3fa
[memref] Handle edge case in subview of full static size fold (#105635)
It is possible to have a subview with a fully static size and a type
that matches the source type, but a dynamic offset that may be
different. However, currently the memref dialect folds:

```mlir
func.func @subview_of_static_full_size(
  %arg0:  memref<16x4xf32,  strided<[4, 1], offset: ?>>, %idx: index)
  -> memref<16x4xf32,  strided<[4, 1], offset: ?>>
{
  %0 = memref.subview %arg0[%idx, 0][16, 4][1, 1]
   : memref<16x4xf32,  strided<[4, 1], offset: ?>>
     to memref<16x4xf32,  strided<[4, 1], offset: ?>>
  return %0 : memref<16x4xf32,  strided<[4, 1], offset: ?>>
}
```

To:

```mlir
func.func @subview_of_static_full_size(
  %arg0: memref<16x4xf32, strided<[4, 1], offset: ?>>, %arg1: index)
  -> memref<16x4xf32, strided<[4, 1], offset: ?>>
{
  return %arg0 : memref<16x4xf32, strided<[4, 1], offset: ?>>
}
```

Which drops the dynamic offset from the `subview` op.
2024-08-23 06:52:09 +01:00
Matthias Springer
a3d41879ec
[mlir][ODS] Optionally generate public C++ functions for type constraints (#104577)
Add `gen-type-constraint-decls` and `gen-type-constraint-defs`, which
generate public C++ functions for type constraints. The name of the C++
function is specified in the `cppFunctionName` field.

Type constraints are typically used for op/type/attribute verification.
They are also sometimes called from builders and transformations. Until
now, this required duplicating the check in C++.

Note: This commit just adds the option for type constraints, but
attribute constraints could be supported in the same way.

Alternatives considered:
1. The C++ functions could also be generated as part of
`gen-typedef-decls/defs`, but that can be confusing because type
constraints may rely on type definitions from multiple `.td` files.
`#include`s could cause duplicate definitions of the same type
constraint.
2. The C++ functions could also be generated as static member functions
of dialects, but they don't really belong to a dialect. (Because they
may rely on type definitions from multiple dialects.)
2024-08-21 08:44:54 +02:00
Luke Boyer
4c77cc634d
[mlir][IR] Fix checkFoldResult error message (#104559)
checkFoldResult error message has expected and actual backwards.
2024-08-16 09:51:01 +02:00
Will Dietz
7a98071da2
[mlir] Verifier: steal bit to track seen instead of set. (#102626)
Tracking a set containing every block and operation visited can become
very expensive and is unnecessary.

Co-authored-by: Will Dietz <w@wdtz.org>
2024-08-09 10:50:25 -05:00
Nikhil Kalra
84cc1865ef
[mlir] Support DialectRegistry extension comparison (#101119)
`PassManager::run` loads the dependent dialects for each pass into the
current context prior to invoking the individual passes. If the
dependent dialect is already loaded into the context, this should be a
no-op. However, if there are extensions registered in the
`DialectRegistry`, the dependent dialects are unconditionally registered
into the context.

This poses a problem for dynamic pass pipelines, however, because they
will likely be executing while the context is in an immutable state
(because of the parent pass pipeline being run).

To solve this, we'll update the extension registration API on
`DialectRegistry` to require a type ID for each extension that is
registered. Then, instead of unconditionally registered dialects into a
context if extensions are present, we'll check against the extension
type IDs already present in the context's internal `DialectRegistry`.
The context will only be marked as dirty if there are net-new extension
types present in the `DialectRegistry` populated by
`PassManager::getDependentDialects`.

Note: this PR removes the `addExtension` overload that utilizes
`std::function` as the parameter. This is because `std::function` is
copyable and potentially allocates memory for the contained function so
we can't use the function pointer as the unique type ID for the
extension.

Downstream changes required:
- Existing `DialectExtension` subclasses will need a type ID to be
registered for each subclass. More details on how to register a type ID
can be found here:
8b68e06731/mlir/include/mlir/Support/TypeID.h (L30)
- Existing uses of the `std::function` overload of `addExtension` will
need to be refactored into dedicated `DialectExtension` classes with
associated type IDs. The attached `std::function` can either be inlined
into or called directly from `DialectExtension::apply`.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-08-06 01:32:36 +02:00
Kazu Hirata
5262865aac
[mlir] Construct SmallVector with ArrayRef (NFC) (#101896) 2024-08-04 11:43:05 -07:00
Alexander Pivovarov
eef1d7e377
[MLIR] Add f8E3M4 IEEE 754 type (#101230)
This PR adds `f8E3M4` type to mlir.

`f8E3M4` type  follows IEEE 754 convention

```c
f8E3M4 (IEEE 754)
- Exponent bias: 3
- Maximum stored exponent value: 6 (binary 110)
- Maximum unbiased exponent value: 6 - 3 = 3
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Precision specifies the total number of bits used for the significand (mantissa), 
    including implicit leading integer bit = 4 + 1 = 5
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 3
- Min exp (unbiased): -2
- Infinities (+/-): S.111.0000
- Zeros (+/-): S.000.0000
- NaNs: S.111.{0,1}⁴ except S.111.0000
- Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5
- Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2)
- Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6)
- Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 =  +/-2^(-2) x 2^(-4) = +/-2^(-6)
```

Related PRs:
- [PR-99698](https://github.com/llvm/llvm-project/pull/99698) [APFloat]
Add support for f8E3M4 IEEE 754 type
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) [MLIR] Add
f8E4M3 IEEE 754 type
2024-08-02 00:22:11 -07:00
Benjamin Kramer
ae4f2495a4 [IR] Verifier: Use a SmallPtrSet for a small set of pointers. NFC 2024-07-27 13:55:04 +02:00
Krzysztof Drewniak
8955e285e1
[mlir] Add property combinators, initial ODS support (#94732)
While we have had a Properties.td that allowed for defining
non-attribute-backed properties, such properties were not plumbed
through the basic autogeneration facilities available to attributes,
forcing those who want to migrate to the new system to write such code
by hand.

## Potentially breaking changes

- The `setFoo()` methods on `Properties` struct no longer take their
inputs by const reference. Those wishing to pass non-owned values of a
property by reference to constructors and setters should set the
interface type to `const [storageType]&`
- Adapters and operations now define getters and setters for properties
listed in ODS, which may conflict with custom getters.
- Builders now include properties listed in ODS specifications,
potentially conflicting with custom builders with the same type
signature.

## Extensions to the `Property` class

This commit  adds several fields to the `Property` class, including:
- `parser`, `optionalParser`, and `printer` (for parsing/printing
properties of a given type in ODS syntax)
- `storageTypeValueOverride`, an extension of `defaultValue` to allow
the storage and interface type defaults to differ
- `baseProperty` (allowing for classes like `DefaultValuedProperty`)

Existing fields have also had their documentation comments updated.

This commit does not add a `PropertyConstraint` analogous to
`AttrConstraint`, but this is a natural evolution of the work here.

This commit also adds the concrete property kinds `I32Property`,
`I64Property`, `UnitProperty` (and special handling for it like for
UnitAttr), and `BoolProperty`.

## Property combinators

`Properties.td` also now includes several ways to combine properties.

One is `ArrayProperty<Property elem>`, which now stores a
variable-length array of some property as
`SmallVector<elem.storageType>` and uses `ArrayRef<elem.storageType>` as
its interface type. It has `IntArrayProperty` subclasses that change its
conversion to attributes to use `DenseI[N]Attr`s instead of an
`ArrayAttr`.

Similarly, `OptionalProperty<Property p>` wraps a property's storage in
`std::optional<>` and adds a `std::nullopt` default value. In the case
where the underlying property can be parsed optionally but doesn't have
its own default value, `OptionalProperty` can piggyback off the optional
parser to produce a cleaner syntax, as opposed to its general form,
which is either `none` or `some<[value]>`.

(Note that `OptionalProperty` can be nested if desired).

  ## Autogeneration changes

Operations and adaptors now support getters and setters for properties
like those for attributes. Unlike for attributes, there aren't separate
value and attribute forms, since there is no `FooAttr()` available for a
`getFooAttr()` to return.

The largest change is to operation formats. Previously, properties could
only be used in custom directives. Now, they can be used anywhere an
attribute could be used, and have parsers and printers defined in their
tablegen records.

These updates include special `UnitProperty` logic like that used for
`UnitAttr`.

## Misc.

Some attempt has been made to test the new functionality.

This commit takes tentative steps towards updating the documentation to
account for properties. A full update will be in order once any followup
work has been completed and the interfaces have stabilized.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
2024-07-26 09:35:06 -05:00
Johannes Reifferscheid
528a662d3a
Fix sign of largest known divisor of div. (#100081)
There's a missing abs, so it returns a negative value if the divisor is
negative. Later this is then cast to uint.
2024-07-23 10:55:32 +02:00
Alexander Pivovarov
019136e30f
[MLIR] Add f8E4M3 IEEE 754 type (#97118)
This PR adds `f8E4M3` type to mlir.

`f8E4M3` type  follows IEEE 754 convention

```c
f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa), 
    including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)
```

Related PRs:
- [PR-97179](https://github.com/llvm/llvm-project/pull/97179) [APFloat]
Add support for f8E4M3 IEEE 754 type
2024-07-22 23:20:28 -07:00
Andrzej Warzyński
2ee5586ac7
[mlir][vector] Make the in_bounds attribute mandatory (#97049)
At the moment, the in_bounds attribute has two confusing/contradicting
properties:
  1. It is both optional _and_ has an effective default-value.
  2. The default value is "out-of-bounds" for non-broadcast dims, and
     "in-bounds" for broadcast dims.

(see the `isDimInBounds` vector interface method for an example of this
"default" behaviour [1]).

This PR aims to clarify the logic surrounding the `in_bounds` attribute
by:
  * making the attribute mandatory (i.e. it is always present),
  * always setting the default value to "out of bounds" (that's
    consistent with the current behaviour for the most common cases).

#### Broadcast dimensions in tests

As per [2], the broadcast dimensions requires the corresponding
`in_bounds` attribute to be `true`:
```
  vector.transfer_read op requires broadcast dimensions to be in-bounds
```

The changes in this PR mean that we can no longer rely on the
default value in cases like the following (dim 0 is a broadcast dim):
```mlir
  %read = vector.transfer_read %A[%base1, %base2], %f, %mask
      {permutation_map = affine_map<(d0, d1) -> (0, d1)>} :
    memref<?x?xf32>, vector<4x9xf32>
```

Instead, the broadcast dimension has to explicitly be marked as "in
bounds:

```mlir
  %read = vector.transfer_read %A[%base1, %base2], %f, %mask
      {in_bounds = [true, false], permutation_map = affine_map<(d0, d1) -> (0, d1)>} :
    memref<?x?xf32>, vector<4x9xf32>
```

All tests with broadcast dims are updated accordingly.

#### Changes in "SuperVectorize.cpp" and "Vectorization.cpp"

The following patterns in "Vectorization.cpp" are updated to explicitly
set the `in_bounds` attribute to `false`:
* `LinalgCopyVTRForwardingPattern` and `LinalgCopyVTWForwardingPattern`

Also, `vectorizeAffineLoad` (from "SuperVectorize.cpp") and
`vectorizeAsLinalgGeneric` (from "Vectorization.cpp") are updated to
make sure that xfer Ops created by these hooks set the dimension
corresponding to broadcast dims as "in bounds". Otherwise, the Op
verifier would complain

Note that there is no mechanism to verify whether the corresponding
memory access are indeed in bounds. Still, this is consistent with the
current behaviour where the broadcast dim would be implicitly assumed
to be "in bounds".

[1]
4145ad2bac/mlir/include/mlir/Interfaces/VectorInterfaces.td (L243-L246)
[2]
https://mlir.llvm.org/docs/Dialects/Vector/#vectortransfer_read-vectortransferreadop
2024-07-16 16:49:52 +01:00
Johannes Reifferscheid
dd7d81ea49
Fix simplification of x + x//c*-c to x mod c. (#98909)
There was no check that rhs is actually a multiplication.
2024-07-15 16:59:47 +02:00
Billy Zhu
06bbbf1ee0
[MLIR] Cyclic AttrType Replacer (#98206)
The current `AttrTypeReplacer` does not allow for custom handling of
replacer functions that may cause self-recursion. For example, the
replacement of one attr/type may depend on the replacement of another
attr/type (by calling into the replacer manually again), which in turn
may depend on the replacement of the original attr/type.

To enable this functionality, this PR broke out the original
AttrTypeReplacer into two parts:
- An uncached base version (`detail::AttrTypeReplacerBase`) that allows
registering replacer functions and has logic for invoking it on
attr/types & their sub-elements
- A cached version (`AttrTypeReplacer`) that provides the same caching
as the original one. This is still the one used everywhere and behavior
is unchanged.

On top of the uncached base version, a `CyclicAttrTypeReplacer` is
introduced that provides caching & cycle-handling for replacer logic
that is cyclic. Cycle-breaking & caching is provided by the
`CyclicReplacerCache` from
https://github.com/llvm/llvm-project/pull/98202.

Both concrete implementations of the uncached base version use CRTP to
avoid dynamic dispatch. The base class merely provides replacer
registration & invocation, and is not meant to be used, or otherwise
extended elsewhere.
2024-07-12 09:24:01 -07:00
Ramkumar Ramachandra
f1eed011b4
MathExtras: add overflow query for signed-div (#97901)
5221634 (Do not trigger UB during AffineExpr parsing) noticed that
divideCeilSigned and divideFloorSigned would overflow when Numerator =
INT_MIN, and Denominator = -1. This observation has already been made by
DynamicAPInt, and it has code to check this. To avoid checks in multiple
callers, centralize this query in MathExtras, and change
divideCeilSigned/divideFloorSigned to assert on overflow.
2024-07-09 09:33:46 +01:00
Ramkumar Ramachandra
db791b278a
mlir/LogicalResult: move into llvm (#97309)
This patch is part of a project to move the Presburger library into
LLVM.
2024-07-02 10:42:33 +01:00
Johannes Reifferscheid
22dfa1aa2c
[mlir] Fold ceil/floordiv with negative RHS. (#97031)
Currently, we only fold if the RHS is a positive constant. There doesn't
seem to be a good reason to do that. The comment claims that division by
negative values is undefined, but I suspect that was just copied over
from the `mod` simplifier.
2024-06-30 11:53:04 +02:00
Johannes Reifferscheid
52216349b6
Do not trigger UB during AffineExpr parsing. (#96896)
Currently, parsing expressions that are undefined will trigger UB during
compilation (e.g. `9223372036854775807 * 2`). This change instead
leaves the expressions as they were.

This change is an NFC for compilations that did not previously involve
UB.
2024-06-28 07:31:33 +02:00
Kazu Hirata
b7b337fb91
[mlir] Use llvm::unique (NFC) (#96415) 2024-06-24 11:54:02 -07:00
Nikita Popov
0aea1f2f21 [mlir] Add missing ManagedStatic.h includes (NFC) 2024-06-21 16:13:41 +02:00
Niranjan Hasabnis
abd95342f0
Reimplementing target description concept using DLTI attribute (#92138)
and Interfaces. This is a newer implementation of PR
https://github.com/llvm/llvm-project/pull/85141 and
[RFC](https://discourse.llvm.org/t/rfc-target-description-and-cost-model-in-mlir/76990)
by considering reviews and comments on the original PR.

As an example of attributes supported by this commit:

```
module attributes {
   dlti.target_system_spec =
     #dlti.target_device_spec<
       #dlti.dl_entry<"dlti.device_id", 0: ui32>,
       #dlti.dl_entry<"dlti.device_type", "CPU">,
       #dlti.dl_entry<"dlti.L1_cache_size_in_bytes", 8192 : ui32>>,
     #dlti.target_device_spec <
       #dlti.dl_entry<"dlti.device_id", 1: ui32>,
       #dlti.dl_entry<"dlti.device_type", "GPU">,
       #dlti.dl_entry<"dlti.max_vector_op_width", 64 : ui32>>,
    #dlti.target_device_spec <
       #dlti.dl_entry<"dlti.device_id", 2: ui32>,
       #dlti.dl_entry<"dlti.device_type", "XPU">>>
}
```
2024-06-19 19:40:08 +01:00
Ramkumar Ramachandra
0fb216fb2f
mlir/MathExtras: consolidate with llvm/MathExtras (#95087)
This patch is part of a project to move the Presburger library into
LLVM.
2024-06-11 23:00:02 +01:00
Will Dietz
46e41c8631
[mlir] Sanitize identifiers with leading symbol. (#94795)
Presently, if name starts with a symbol it's converted to hex which may
cause the result to be invalid by starting with a digit.

Address this and add a small test.

Co-authored-by: Will Dietz <w@wdtz.org>
2024-06-10 19:12:34 -05:00
Mehdi Amini
2df68e0503
[MLIR] Fix generic assembly syntax for ArrayAttr containing hex float (#94583)
When a float attribute is printed with Hex, we should not elide the type
because it is parsed back as i64 otherwise.
2024-06-06 07:51:47 -07:00
Benjamin Maxwell
29a925abb6
[mlir][affine][Analysis] Add conservative bounds for semi-affine mods (#93576)
This patch adds support for computing bounds for semi-affine mod
expression to FlatLinearConstraints. This is then enabled within the
ScalableValueBoundsConstraintSet to allow computing the bounds of
scalable remainder loops.

E.g. computing the bound of something like:
```
// `1000 mod s0` is a semi-affine.
#remainder_start_index = affine_map<()[s0] -> (-(1000 mod s0) + 1000)>
#remaining_iterations = affine_map<(d0) -> (-d0 + 1000)>

%0 = affine.apply #remainder_start_index()[%c8_vscale]
scf.for %i = %0 to %c1000 step %c8_vscale {
  %remaining_iterations = affine.apply #remaining_iterations(%i)
  // The upper bound for the remainder loop iterations should be:
  // %c8_vscale - 1  (expressed as an affine map,
  // affine_map<()[s0] -> (s0 * 8 - 1)>, where s0 is vscale)
  %bound = "test.reify_bound"(%remaining_iterations) <{scalable, ...}>
}
```

There are caveats to this implementation. To be able to add a bound for
a `mod` we need to assume the rhs is positive (> 0). This may not be
known when adding the bounds for the `mod` expression. So to handle this
a constraint is added for `rhs > 0`, this may later be found not to hold
(in which case the constraints set becomes empty/invalid).

This is not a problem for computing scalable bounds where it's safe to
assume `s0` is vscale (or some positive multiple of it). But this may
need to be considered when enabling this feature elsewhere (to ensure
correctness).
2024-06-05 11:35:13 +01:00
Beal Wang
4d60be0452
[mlir] Do not print empty property (#93379)
Skip printing property as `<<<NULL ATTRIBUTE>>>` when operation has an
empty property.

Co-authored-by: Biao Wang <biaow@nvidia.com>
2024-05-25 09:24:12 -06:00
Krzysztof Parzyszek
33550b43f4
[mlir] Add operator<< for printing Block (#92550)
Turns out it was already in Analysis/CFGLoopInfo, so just move it
to IR/AsmPrinter.
2024-05-18 08:03:19 -05:00
Kazu Hirata
dec8055a1e
[mlir] Use StringRef::operator== instead of StringRef::equals (NFC) (#91560)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  10 under mlir/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-08 23:52:22 -07:00
Max191
7e35a9a0e7
[mlir] Replace dynamic sizes in insert_slice of tensor.cast canonicalization (#91352)
In some cases this pattern may ignore static information due to dynamic
operands in the insert_slice sizes operands, e.g.:
```
%0 = tensor.cast %arg0 : tensor<1x?xf32> to tensor<?x?xf32>
%1 = tensor.insert_slice %0 into %arg1[...] [%s0, %s1] [...] 
    : tensor<?x?xf32> into tensor<?x?xf32>
```
Can be rewritten into:
```
%1 = tensor.insert_slice %arg0 into %arg1[...] [1, %s1] [...] 
    : tensor<1x?xf32> into tensor<?x?xf32>
```
This PR updates the matching in the pattern to allow rewrites like this.
2024-05-08 15:05:53 -04:00
Scott Manley
57175533da
[MLIR][IR] add -mlir-print-unique-ssa-ids to AsmPrinter (#91241)
Add an option to unique the numbers of values, block arguments and
naming conflicts when requested and/or printing generic op form. This is
helpful when debugging. For example, if you have:

    scf.for
      %0 =
      %1 = opA %0

    scf.for
      %0 =
      %1 = opB %0

And you get a verifier error which says opB's "operand #0 does not
dominate this use", it looks like %0 does dominate the use. This is not
intuitive. If these were numbered uniquely, it would look like:

    scf.for
      %0 =
      %1 = opA %0

    scf.for
      %2 =
      %3 = opB %0

And thus, much clearer as to why you are getting the error since %0 is
out of scope. Since generic op form should aim to give you the most
possible information, it seems like a good idea to use unique numbers in
this situation. Adding an option also gives those an option to use it
outside of generic op form.

Co-authored-by: Scott Manley <scmanley@nvidia.com>
2024-05-07 08:45:28 -07:00
Christian Ulmann
4513050f52
[MLIR] Harmonize the behavior of the folding API functions (#88508)
This commit changes `OpBuilder::tryFold` to behave more similarly to
`Operation::fold`. Concretely, this ensures that even an in-place fold
returns `success`.
This is necessary to fix a bug in the dialect conversion that occurred
when an in-place folding made an operation legal. The dialect conversion
infrastructure did not check if the result of an in-place folding
legalized the operation and just went ahead and tried to apply pattern
anyways.

The added test contains a simplified version of a breakage we observed
downstream.
2024-04-23 08:05:55 +02:00
Christian Sigg
a5757c5b65
Switch member calls to isa/dyn_cast/cast/... to free function calls. (#89356)
This change cleans up call sites. Next step is to mark the member
functions deprecated.

See https://mlir.llvm.org/deprecation and
https://discourse.llvm.org/t/preferred-casting-style-going-forward.
2024-04-19 15:58:27 +02:00
Beal Wang
d488b2225d
[mlir][ods] Do not print default-valued properties when the value is equal to the default (#87970)
This diff causes the `tblgen`-erated printProperties() function to skip
printing a `DefaultValuedAttr` property when the value is equal to the
default.

Co-authored-by: Biao Wang <biaow@nvidia.com>
2024-04-12 10:37:50 +02:00
Andrei Golubev
be006372f3
[mlir][OpPrintingFlags] Allow to disable ElementsAttr hex printing (#85766)
At present, large ElementsAttr is unconditionally printed with a hex
string. This means that in IR large constant values often look like:
dense<"0x000000000004000000080000000004000000080000000..."> :
tensor<10x10xi32>

Hoisting hex printing control to the user level for tooling means that
one can disable the feature and get human-readable values when
necessary:
dense<[16, 32, 48, 500...]> : tensor<10x10xi32>

Note: AsmPrinterOptions::printElementsAttrWithHexIfLarger is not always
possible to be used as it requires that one exposes MLIR's command-line
options in user tooling (including an actual compiler).

Co-authored-by: Harald Rotuna <harald.razvan.rotuna@intel.com>
2024-04-09 02:08:32 +02:00
MaheshRavishankar
5aeb604c7c
[mlir][SCF] Modernize coalesceLoops method to handle scf.for loops with iter_args (#87019)
As part of this extension this change also does some general cleanup

1) Make all the methods take `RewriterBase` as arguments instead of
   creating their own builders that tend to crash when used within
   pattern rewrites
2) Split `coalesePerfectlyNestedLoops` into two separate methods, one
   for `scf.for` and other for `affine.for`. The templatization didnt
   seem to be buying much there.

Also general clean up of tests.
2024-04-04 13:44:24 -07:00
Matthias Springer
38113a0832
[mlir][IR] Trigger notifyOperationReplaced on replaceAllOpUsesWith (#84721)
Before this change: `notifyOperationReplaced` was triggered when calling
`RewriteBase::replaceOp`.
After this change: `notifyOperationReplaced` is triggered when
`RewriterBase::replaceAllOpUsesWith` or `RewriterBase::replaceOp` is
called.

Until now, every `notifyOperationReplaced` was always sent together with
a `notifyOperationErased`, which made that `notifyOperationErased`
callback irrelevant. More importantly, when a user called
`RewriterBase::replaceAllOpUsesWith`+`RewriterBase::eraseOp` instead of
`RewriterBase::replaceOp`, no `notifyOperationReplaced` callback was
sent, even though the two notations are semantically equivalent. As an
example, this can be a problem when applying patterns with the transform
dialect because the `TrackingListener` will only see the
`notifyOperationErased` callback and the payload op is dropped from the
mappings.

Note: It is still possible to write semantically equivalent code that
does not trigger a `notifyOperationReplaced` (e.g., when op results are
replaced one-by-one), but this commit already improves the situation a
lot.
2024-04-02 10:53:57 +09:00
Jakub Kuderski
971b852546
[mlir][NFC] Simplify type checks with isa predicates (#87183)
For more context on isa predicates, see:
https://github.com/llvm/llvm-project/pull/83753.
2024-04-01 11:40:09 -04:00