Unroll And Jam was supported in affine dialect long time ago using pass.
This commit exposes the pattern using transform and in addition adds
partial support for SCF loops.
1D multi-reduction are lowered to arith which can prevent some
optimisations. I propose `ElementwiseToOuterproduct` matching a series of
ops to generate `vector.outerproduct`.
As part of some `ElementwiseToVectorOpsPatterns`, it could allow to fuse
other elementwiseOps to vector dialect.
Originally discussed
https://discourse.llvm.org/t/on-improving-arm-sme-lowering-resilience-in-mlir/78543/24.
quote @MacDue
```
%lhsBcast = vector.broadcast %lhsCast : vector<[4]xf32> to vector<[4]x[4]xf32>
%lhsT = vector.transpose %lhsBcast, [1, 0] : vector<[4]x[4]xf32> to vector<[4]x[4]xf32>
%rhsBcast = vector.broadcast %rhs : vector<[4]xf32> to vector<[4]x[4]xf32>
%mul = arith.mulf %lhsT, %rhsBcast : vector<[4]x[4]xf32>
```
Can be rewritten as:
```
%mul = vector.outerproduct $lhs, $rhs : vector<[4]xf32>, vector<[4]xf32>
```
---------
Co-authored-by: Han-Chung Wang <hanhan0912@gmail.com>
The main goal of this and subsequent PRs is to unify and categorize
tests in:
* vector-transfer-flatten.mlir
This should make it easier to identify the edge cases being tested (and
how they differ), remove duplicates and to add tests for scalable
vectors.
Below are the main contributions of this PR
1. Two tests duplicated
`@transfer_{read|write}_dims_mismatch_non_contiguous_slice`:
* `@transfer_{read|write}_dims_mismatch_non_contiguous` and
* `@transfer_read_flattenable_negative` duplicated
`@transfer_{read|write}_dims_mismatch_non_contiguous_slice`.
These tests are removed (the original test is preserved).
2. `@transfer_read_flattenable_negative2` is replaced with
two tests with more descriptive names:
* `@transfer_read_non_contiguous_src` (for `xfer_read`) and
* `@transfer_write_non_contiguous_src` (for `xfer_write`)
The mlir-translate tool calls into the parser without loading registered
dependent dialects, and the parser only loads attributes if the
fully-namespaced attribute is present in the textual IR. This causes
parsing to break when an op has an attribute that prints/parses without
the namespaced attribute.
Co-authored-by: Jeremy Kun <jkun@google.com>
Previously, there was at least one virtual function call for every
allocated register. The only users of this feature are AMDGPU and RISC-V
(RVV), other targets don't use this. To easily identify these cases,
change the default functor to nullptr and don't call it for every
allocated register.
In some situations a new `VarTemplateSpecializationDecl` (for the same
template) can be added during import of another one. The "insert
position" that is used to insert the current object into the list of
specializations is stored at start of the import and is used later. If
the list changes before the insertion the position is not valid any
more.
On x86, many instructions have tied operands, so allocateInstruction
uses the more complex assignment strategy, which computes the assignment
order of virtual defs first. This involves iterating over all register
classes (or register aliases for physical defs) to compute the possible
number of defs per register class.
However, this information is only used for sorting virtual defs and
therefore not required when there's only one virtual def -- which is a
very common case. As iterating over all register classes/aliases is not
cheap, do this only when there's more than one virtual def.
It was incorrect to set the insertion point to the init block after
inlining the initialization region because the code generated in the
init block depends upon the value yielded from the init region. When
there were multiple reduction initialization regions each with multiple
blocks, this could lead to the initilization region being inlined after
the init block which depends upon it.
Moving the insertion point to before inlining the initialization block
turned up further issues around the handling of the terminator for the
initialization block, which are also fixed here.
This fixes a bug in #92430 (but the affected code couldn't compile
before #92430 anyway).
The main goal of this and subsequent PRs is to unify and categorize
tests in:
* vector-transfer-flatten.mlir
This should make it easier to identify the edge cases being tested (and
how they differ), remove duplicates and to add tests for scalable
vectors.
The main contributions of this PR:
* split tests that covered `xfer_read` + `xfer_write` into separate
tests (majority of the existing tests check _one_ xfer Op at a time),
* organise tests for `xfer_read` and `xfer_write` into separate
groups (separate with a big bold comment).
Note, all tests (i.e. test cases) are preserved and some new tests are
added. Deletions that you will see in `git diff` correspond to
`xfer_write` and `xfer_read` Ops being extracted to separate functions
(so that there's one xfer Op per function). In particular, the number of
test functions has grown from 26 to 30.
In addition, this PR unifies the tests so that:
* input variable names are consistent (e.g. make sure that the input
memref is always `arg`)
* CHECK lines use similar indentations
* 2 x tabs are always used for function arguments, 1 x tab for
function body
Finally, changes in "VectorTransferOpTransforms.cpp" are merely meant to
unify comments and logic between
* `FlattenContiguousRowMajorTransferWritePattern` and
* `FlattenContiguousRowMajorTransferReadPattern`.
See the added test for the motivation example. In that example, we add a
new function declaration in `a.cppm` and this is not used in the reduced
BMI of `b.cppm`. We expect that the change won't affect the BMI of
`b.cppm`. But it is the not the case.
There are 2 reason for unexpected result:
1. We would register the interesting identifiers in a pretty late phase.
This may cause some some predefined identifier ID change due to we
insert other identifiers during emitting decls and types.
2. In `GenerateNameLookup`, we would generate information for predefined
decls. This may not be intended. Since every predefined decl doesn't
belong to any module.
And this patch solves the first issue by registering the identifiers in
the very early posititon to make sure the ID won't get affected by the
process to emit decls and types. And we solve the second question by
filtering predefined decls simply.
This patch covers
[CWG2811](https://cplusplus.github.io/CWG/issues/2811.html) "Clarify
"use" of main", basically adding a test for `-Wmain`, focusing on usages
of `main` in unevaluated contexts.
To my understanding, the diagnostic message is based on the wording, so
I updated it based on the new wording. I also replaces "ISO C++
requires" with a phrasing that explicitly says "extension".
This PR covers the following Core issues:
[CWG204](https://cplusplus.github.io/CWG/issues/204.html) "Exported
class templates"
[CWG323](https://cplusplus.github.io/CWG/issues/323.html) "Where must
`export` appear?"
[CWG335](https://cplusplus.github.io/CWG/issues/335.html) "Allowing
`export` on template members of nontemplate classes"
[CWG820](https://cplusplus.github.io/CWG/issues/820.html) "Deprecation
of `export`"
I believe the list above is entirety of Core issues that are dedicated
solely to `export template`.
I believe we have two main points of view here, which command what this
PR should do:
1. (easy) Removal of `export template` was done as a defect report in
CWG820, and the rest are effectively superseded by it, because we apply
defect reports retroactively.
2. (harder) Those Core issues are testable individually, so we should
test them for the behavior Core wanted at the time.
This PR implements the first option, making our C++ DR status page
greener.
I think I can be persuaded to go with the second option, if reviewers
have strong preference for it.
Any fp128 need to end up as libcall, as will f32->i128 and f64->i128.
f16 are a bit special as the maximum range of the result fits in a i17,
so can be shrank to an i64. Vector with i128/fp128 types are scalarized.
@kiranchandramohan mentioned
[here](https://github.com/llvm/llvm-project/pull/91582#discussion_r1606046605)
that LLVM IR tests should go in the Integration folder. He also
mentioned
[here](https://github.com/llvm/llvm-project/pull/91582#discussion_r1606684034)
that tests for `add-debug-info` pass should test that pass only. There
were some tests which were added before his comments so I have cleaned
them in this PR. The following changes were made.
1. Move LLVM IR tests to `Integration` folder.
2. Change tests from f90 to fir and only test changes done by
`add-debug-info` pass.
Syntacore SCR3 is a microcontroller-class processor core. Overview:
https://syntacore.com/products/scr3
This PR introduces two CPUs:
* 'syntacore-scr3-rv32' which is rv32imc
* 'syntacore-scr3-rv64' which is rv64imac
---------
Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com>
Fixes https://github.com/llvm/llvm-project/issues/96197.
A global alias should always point to a definition. Ifuncs are
definitions, so far so good. However an ifunc may be statically resolved
to a function that is declared but not defined in the translation unit.
With this patch we perform static resolution if:
* the resolvee is defined, else if
* none of the ifunc users is a global alias
Fixes:
llvm/lib/Target/AArch64/AArch64GenRegisterBankInfo.def:185:12: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
assert(!Size.isScalable() || MinSize >= 128
~~~~~~~~~~~~~~
&& "Scalable vector types should have size of at least 128 bits");
In the re-commit, just dropping the propagation of `writeonly` as that
is the only attribute that can play poorly with call slot optimization
(see issue: #95152 for more details).
Closes#95888
This commit removes a `FIXME` in the code base that was in place because
of patterns that used the dialect conversion API incorrectly. Those
patterns have been fixed and the workaround is no longer needed.
The main goal of this PR (and subsequent PRs), is to add more tests with
scalable vectors to:
* vector-transfer-collapse-inner-most-dims.mlir
There's quite a few cases to consider, hence this is split into multiple
PRs. In this PR, `@outer_dyn_drop_inner_most_dim` is replaced with:
* `@contiguous_inner_most_dynamic_outer`
I am also adding a similar test for scalable vectors. In addition,
* `@drop_two_inner_most_dim` and
`@drop_two_inner_most_dim_scalable_inner_dim`,
are renamed as `@contiguous_inner_most` and
`@contiguous_inner_most_scalable_inner_dim`, respectively, to match
their counterpart for `xfer_read`.
NOTE: This PR is limited to tests for `vector.transfer_write`
This is a follow-up for: #94490, #94604, #94906
clangSerialization currently uses hash_combine/hash_value from
Hashing.h, which are not guaranteed to be deterministic.
Replace these uses with xxh3_64bits.
Pull Request: https://github.com/llvm/llvm-project/pull/96136