This reverts commit 32f543760c7f44c4c7d18bc00a3a1d8860c42bff.
Investigations showed that the unit test utilities were calling erase(),
causing a use-after-free. Fixed by rearranging checks in the test
Fixes: https://github.com/llvm/llvm-project/issues/53034
Properly check the scopes of affine operations across which we are
performing affine analysis. Fixes transforms and analyses in the
presence of non-affine regions.
This PR adds an initial implementation for the map modifiers close,
present and ompx_hold, primarily just required adding the appropriate
map type flags to the map type bits. In the case of ompx_hold it
required adding the map type to the OpenMP dialect. Close has a bit of a
problem when utilised with the ALWAYS map type on descriptors, so it is
likely we'll have to make sure close and always are not applied to the
descriptor simultaneously in the future when we apply always to the
descriptors to facilitate movement of descriptor information to device
for consistency, however, we may find an alternative to this with
further investigation. For the moment, it is a TODO/Note to keep track
of it.
Removes the condition that checks that operand is not indexed by
reduction iterators which allows for more fine-grained control via the
reshape fusion control function. For example, users could allow fusing
reshapes expand the M/N dims of a matmul but not the K dims (or preserve
the current behavior by not fusing at all).
---------
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Add Rocdl support for the following GFX950 instructions:
CVT_SCALE_PK_FP8_F16
CVT_SCALE_PK_BF8_F16
CVT_SCALE_PK_FP8_BF16
CVT_SCALE_PK_BF8_BF16
CVT_SCALE_SR_FP8_F16
CVT_SCALE_SR_BF8_F16
CVT_SCALE_SR_FP8_BF16
CVT_SCALE_SR_BF8_BF16
CVT_SCALE_PK_F16_FP8
CVT_SCALE_PK_F16_BF8
CVT_SCALE_F16_FP8
CVT_SCALE_F16_BF8
`inst->getFnAttr(Kind)` fallbacks to check if the parent has an
attribute, which breaks roundtriping the LLVM IR. This change actually
checks only in the call attribute list (no fallback to parent queries).
It's possible to argue that this small optimization isn't harmful, but
seems too early if it's breaking roundtrip behavior.
Add FP8 lit tests to the following operators:
ARGMAX
AVGPOOL
CONV2D
CONV3D
DEPTHWISE_CONV2D
MATMUL
MAX_POOL2D
TRANSPOSE_CONV2D
CONST
CAST
CONCAT
PAD
RESHAPE
REVERSE
SLICE
TILE
TRANSPOSE
GATHER
SCATTER
Signed-off-by: Tai Ly <tai.ly@arm.com>
Signed-off-by: Jerry Ge <jerry.ge@arm.com>
Co-authored-by: Tai Ly <tai.ly@arm.com>
Fixes#130159
The problem is that the alloca created for the private variable uses the
default alloca address space in that module, but the function the
pointer is being passed to expects a different address space, leading to
a type missmatch in the function argument.
I know nothing about how AMDGPU is supposed to work. I based this
solution on code from createDeviceArgumentAccessor(). Please could
somebody from AMD confirm this solution is appropriate.
Current lowering for tosa.depthwise_conv2d assumes if both zero points
are zero then it's a floating-point operation by hardcoding the use of a
arith.addf in the lowered code. Fix code to check for the element type
to decide what add operation to use.
Previously an extra block was created by splitting the previous exit
block. This produced incorrect results when the outlined region
statically never terminated because then there wouldn't be a valid exit
block for the outlined region, this caused this newly added block to
have an incoming edge from outside of the outlining region, which caused
outlining to fail.
So far as I can tell this extra block no longer serves any purpose. The
comment says it is supposed to collate multiple control flow edges into
one place, but the code as it is now does not achieve this. In fact, as
can be seen from the changes to lit tests, this block was not actually
outlined in the end. This is because there are actually two code
extractors: one in the callback for creating a parallel op which is used
to find what the input/output variables are (which does have this block
added to it), and another one which actually does the outlining (which
this block was not added to).
Tested with the gfortran and fujitsu test suites.
Fixes#112884
Deprecate the `match` and `rewrite` functions. They mainly exist for
historic reasons. This PR also updates all remaining uses of in the MLIR
codebase.
This is addressing a
[comment](https://github.com/llvm/llvm-project/pull/129861#pullrequestreview-2662696084)
on an earlier PR.
Note for LLVM integration: `SplitMatchAndRewrite` will be deleted soon,
update your patterns to use `matchAndRewrite` instead of separate
`match` / `rewrite`.
---------
Co-authored-by: Jakub Kuderski <jakub@nod-labs.com>
It is hoped that the Ops, Types, and Attribute of the NVGPU dialect can
be defined in separate files.If downstream projects extend NVGPU and
define other Ops, the types and attributes will be used.This PR was
raised to avoid including the definition of NVGPU Ops.
Updated the dialect to match TOSA v1.0 specification for ConstOp and
ConstShapeOp (https://www.mlplatform.org/tosa/tosa_spec.html#_const).
Also updated lit tests
---------
Signed-off-by: Jerry Ge <jerry.ge@arm.com>
OpPassManager contains a field of type std::unique_ptr which
is not guaranteed to be trivially relocatable so we cannot use
llvm::array_pod_sort.
Reviewers: River707, joker-eph
Reviewed By: joker-eph
Pull Request: https://github.com/llvm/llvm-project/pull/129968
This PR moves vector.insert canonicalizers for DenseElementsAttr (splat
and non splat case) to folders. Folders are local, and it's always
better to implement a folder than a canonicalizer.
This PR is mostly NFC-ish, because the functionality mostly remains
same, but is now run as part of a folder, which is why some tests are
changed, because GreedyPatternRewriter tries to fold by default.
Updates some instances of plain `return failure();` in VectorToSCF.cpp
with `return notifyMatchFailure();` and a description (usually copied
from the nearby comment).
There's many more "plain" `return failure();` left, but ATM I only
have the cycles for the ones updated here.
When generating aliases from `OpAsm{Dialect,Type,Attr}Interface`, the
result would be sanitized and if the alias provided by the interface has
a trailing digit, AsmPrinter would attach an underscore to it to
presumably prevent confliction.
#### Motivation
There are two reasons to motivate the change from the old behavior to
the proposed behavior
1. If the type/attribute can generate unique alias from its content,
then the extra trailing underscore added by AsmPrinter will be strange
```mlir
func.func @add(%ct: !ct_L0_) -> !ct_L0_
%ct_0 = bgv.add %ct, %ct : (!ct_L0_, !ct_L0_) -> !ct_L0_
%ct_1 = bgv.add %ct_0, %ct_0 : (!ct_L0_, !ct_L0_) -> !ct_L0_
%ct_2 = bgv.add %ct_1, %ct_1 : (!ct_L0_, !ct_L0_) -> !ct_L0_
return %ct_2 : !ct_L0_
}
```
Which aesthetically would be better if we have `(!ct_L0, !ct_L0) ->
!ct_L0`
2. The Value name behavior is that, for the first instance, use no
suffix `_N`, which can be similarly applied to alias name. See the IR
above where the first one is called `%ct` and others are called `%ct_N`.
See `uniqueValueName` for detail.
#### Conflict detection
```mlir
!test.type<a = 3> // suggest !name0
!test.type<a = 4> // suggest !name0
!test.another<b = 3> // suggest !name0_
!test.another<b = 4> // suggest !name0_
```
The conflict detection is based on `nameCounts` in `initializeAliases`,
where
In the original way, the first two will get sanitized to `!name0_` and
`initializeAlias` can assign unique id `0, 1, 2, 3` to them.
In the current way, the `initializeAlias` uses `usedAliases` to track
which name has been used, and use such information to generate a suffix
id that will make the printed alias name unique.
The result for the above example is `!name0, !name0_1, !name0_,
!name0_2` now.
/llvm-project/mlir/lib/Conversion/MeshToMPI/MeshToMPI.cpp:70:2:
error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
};
^
1 error generated.
Failing bot:
* https://lab.llvm.org/buildbot/#/builders/17/builds/6269
The offending test was added in:
* https://github.com/llvm/llvm-project/pull/129696
To fix this, use:
* `%mcr_aarch64_cmd` (which is what we used for ArmSVE e2e tests),
Instead of:
* `mlir-cpu-runner` (which was renamed to `mlir-runner` in #123776).
Committed without review due to being a trivial test update; if this
needs discussion, please ping me or leave a comment on GitHub.
- do not create MPI operations if no halo exchange is needed
- allow returning sharding information through `!mesh.sharding`
(gets converted into a tuple of tensors)
- lowering `mesh.shard_shape` including fixes to the operation itself
- global symbol `static_mpi_rank` replaced by an DLTI attribute
(now aligned with MPIToLLVM)
- smaller fixes and some minor cleanup
---------
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
This PR makes sure that we always use `Type::isIntOrFloat` rather than
re-implementing this condition inline. Also, it removes `isScalarType`
that effectively re-implemented this method.
If a transformation should be a canonicalization is an orthogonal
question to if a transformation should be implemented as a
`RewritePattern` or a `fold` method. The later is an implementation
detail.
This patch adds a suggestion to always implement a canonicalization as a
`fold` pattern if possible, as they are a restricted subset of a
`RewritePattern`.
This has been a common source of confusion, as to when to implement a
canonicalization as a fold method or a RewritePattern.
The vast majority of rewrite / conversion patterns uses a combined
`matchAndRewrite` instead of separate `match` and `rewrite` functions.
This PR optimizes the code base for the most common case where users
implement a combined `matchAndRewrite`. There are no longer any `match`
and `rewrite` functions in `RewritePattern`, `ConversionPattern` and
their derived classes. Instead, there is a `SplitMatchAndRewriteImpl`
class that implements `matchAndRewrite` in terms of `match` and
`rewrite`.
Details:
* The `RewritePattern` and `ConversionPattern` classes are simpler
(fewer functions). Especially the `ConversionPattern` class, which now
has 5 fewer functions. (There were various `rewrite` overloads to
account for 1:1 / 1:N patterns.)
* There is a new class `SplitMatchAndRewriteImpl` that derives from
`RewritePattern` / `OpRewritePatern` / ..., along with a type alias
`RewritePattern::SplitMatchAndRewrite` for convenience.
* Fewer `llvm_unreachable` are needed throughout the code base. Instead,
we can use pure virtual functions. (In cases where users previously had
to implement `rewrite` or `matchAndRewrite`, etc.)
* This PR may also improve the number of [`-Woverload-virtual`
warnings](https://discourse.llvm.org/t/matchandrewrite-hiding-virtual-functions/84933)
that are produced by GCC. (To be confirmed...)
Note for LLVM integration: Patterns with separate `match` / `rewrite`
implementations, must derive from `X::SplitMatchAndRewrite` instead of
`X`.
---------
Co-authored-by: River Riddle <riddleriver@gmail.com>
This re-applies f905bf3e1ef860c4d6fe67fb64901b6bbe698a91, which was reverted in
c861c1a046eb8c1e546a8767e0010904a3c8c385 due to compiler errors, with a fix for
MLIR.
These are very common when using intrinsics (e.g. ARM NEON).
For more context: ClangIR has currently been blocked on such intrinsics
emission because of this lacking capability.
Basically catch up with llvm.call and add support for translate and
import to LLVM IR.
This PR is split into two commits in case it's easier to review the
refactoring part, which comes first (happy to split the PR if
necessary).
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
This commit adds builders of the form
```
static void build(..., [TypeRange resultTypes],
ValueRange operands, const Properties &properties,
ArrayRef<NamedAttribute> discardableAttributes = {},
[unsigned numRegions]);
```
to go alongside the existing
result/operands/[inherent + discardable attribute list] collective
builders.
This change is intended to support a refactor to the declarative rewrite
engine to make it populate the `Properties` struct instead of creating a
`DictionaryAttr`, thus enabling rewrite rules to handle non-`Attribute`
properties.
More generally, this means that generic code that would previously call
`getAttrs()` to blend together inherent and discardable attributes can
now use `getProperties()` and `getDiscardableAttrs()` separately, thus
removing the need to serialize everything into a temporary
`DictionaryAttr`.
This patch adds an e2e test for the `linalg.pack` + `linalg.unpack` pair
with a dynamic inner tile size that's tied to SVE's "vscale":
```mlir
%c4 = arith.constant 4 : index
%vs = vector.vscale
%tile_size = arith.muli %c4, %vs : index
```
This means that the actual size of the corresponding inner and outer
tile size will depend on the runtime value of "vscale".
To make the new test deterministic (and to make it easier to
experiment), I have hard-coded the value of "vscale" to 2 via (2 x 128
bits = 256 bits):
```mlir
`func.call @setArmVLBits(%c256) : (i32) -> ()
```
This can be relaxed at a later time or played with when experimenting
locally with e.g. QEMU.
NOTE: Vectorization has not been enabled yet (scalable vectorization of
`linalg.unpack` is still WIP).
For ConcatOp this commit also enhances the verifier by
checking 4 another conditions:
- The input list is not empty
- The axis value is within range of the input shapes
- All inputs have the same rank
- All non concatenate axis dims have the same value
For MatmulOp:
- Checked input a, bs tensor type, element types
For the following operators, added the verifySameElementTypes check.
- PadOp
- SliceOp
- TileOp
- ReshapeOp
- TransposeOp
- GatherOp
- ScatterOp
- MaxPool2dOp
- ReverseOp
- SelectOp
Signed-off-by: Jerry Ge <jerry.ge@arm.com>
Co-authored-by: Tai Ly <tai.ly@arm.com>
Co-authored-by: Luke Hutton <luke.hutton@arm.com>