Because symbols cannot refer to operations outside of their symbol
tables, it was impossible to refer to operations outside of the dialect
currently being defined. This PR modifies the lookup logic to happen
relative to the symbol table containing the dialect-defining operations.
This is a bit of hack but should unblock the situation here.
To keep the test filenames consistent, this patch:
* removes "test-" from file names (there used to be a mix of
"test-feature-1.mlir" and "feature-2.mlir"),
* replaces "_" with "-" (there used to be a mix of "feature-3.mlir"
and "feature_4.mlir").
Only files under test/Integration/Dialect/Vector/CPU are updated.
Better design to put semantics on the ops, and in this case the ntt/intt
op can lower in multiple ways depending on the polynomial ring modulus
(it can need an nth root of unity for cyclic polymul -> ntt, or a 2nth
root for negacyclic polymul -> ntt)
---------
Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
In -convert-vector-to-arm-sme the permutation_map is explicitly checked
for transpose when converting xfer ops, but for 2-D vector types the
only non-identity permutation map is transpose so this can be
simplified.
This patch implements the lowering of vector.deinterleave
for 1D vectors.
For fixed vector types, the operation is lowered to two
llvm shufflevector operations. One for even indexed
elements and the other for odd indexed elements. A poison
operation is used to satisfy the parameters of the
shufflevector parameters.
For scalable vectors, the llvm vector.deinterleave2
intrinsic is used for lowering. As such the results
found by extraction and used to form the result
struct for the intrinsic.
When an operation is erased in Python, its children may still be in the
"live" list inside Python bindings. After this, if some of the newly
allocated operations happen to reuse the same pointer address, this will
trigger an assertion in the bindings. This assertion would be incorrect
because the operations aren't actually live. Make sure we remove the
children operations from the "live" list when erasing the parent.
This also concentrates responsibility over the removal from the "live"
list and invalidation in a single place.
Note that this requires the IR to be sufficiently structurally valid so
a walk through it can succeed. If this invariant was broken by, e.g, C++
pass called from Python, there isn't much we can do.
The data-layout independent constant folding currently has some rather
gnarly code for canonicalizing GEP indices to reduce "notional
overindexing", and then infers inbounds based on that canonicalization.
Now that we canonicalize to i8 GEPs, this canonicalization is
essentially useless, as we'll discard it as soon as the GEP hits the
data-layout aware constant folder anyway. As such, I'd like to remove
this code entirely.
This shouldn't have any impact on optimization capabilities.
These act as constants and should be propagated whenever possible. It is
safe to do so for mlir.undef and mlir.poison because they remain "dirty"
through out their lifetime and can be duplicated, merged, etc. per the
LangRef.
Signed-off-by: Guy David <guy.david@nextsilicon.com>
This commit changes the LLVM dialect's inliner interface to stop
assuming that the inlined function only contained unstructured control
flow. This is not necessarily true, and it lead to not properly
propagating the noalias information.
MLIR LLMArrayType is using `unsigned` for the number of elements while
LLVM ArrayType is using `uint64_t`
4ae896fe97/llvm/include/llvm/IR/DerivedTypes.h (L377)
This leads to silent truncation when we use it for globals in flang.
```
program test
integer(8), parameter :: large = 2**30
real, dimension(large) :: bigarray
common /c/ bigarray
bigarray(999) = 666
end
```
The above program would result in a segfault since the global would be
of size 0 because of the silent truncation.
```
fir.global common @c_(dense<0> : vector<4294967296xi8>) : !fir.array<4294967296xi8>
```
became
```
llvm.mlir.global common @c_(dense<0> : vector<4294967296xi8>) {addr_space = 0 : i32} : !llvm.array<0 x i8>
```
This patch updates the definition of MLIR ArrayType to take `uint64_t`
as argument of the number of elements to be compatible with LLVM.
- Add integration test for `vector.shuffle` and `vector.interleave`,
mentioned in issue #91978
- Add `VectorToSPIRV` patterns to `GPUToSPIRVPass`
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Adds a named op: linalg.conv_2d_ngchw_gfchw_q. This op is similar to
linalg.conv_2d_ngchw_gfchw, but additionally incorporates zero point
offset corrections.
Expressions with the same precedence were not parenthesized and
therefore were possibly evaluated in the wrong order depending on the
shape of the expression tree.
---------
Co-authored-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Corentin Ferry <corentin.ferry@amd.com>
Using `for_` is very hand with python bindings. Currently, it doesn't
support results, we had to fallback to two lines scf.for.
This PR yields results of scf.for in `for_`
---------
Co-authored-by: Maksim Levental <maksim.levental@gmail.com>
Tighten the verifier for arith cast ops to disallow changing tensor
dimensions, e.g., static to dynamic. After this change:
* `arith.cast_op %x : tensor<4xi32> to tensor<4xf32>` remains valid
* `arith.cast_op %x : tensor<4xi32> to tensor<?xf32>` becomes invalid
* `arith.cast_op %x : tensor<?xi32> to tensor<4xf32>` becomes invalid
This is mostly to simplify the op semantics. See the discussion thread
for more context:
https://discourse.llvm.org/t/rfc-remove-arith-math-ops-on-tensors/74357/63.
Integer range analysis will not update the range of an operation when
any of the inferred input lattices are uninitialized. In the current
behavior, all lattice values for non integer types are uninitialized.
For operations like arith.cmpf
```mlir
%3 = arith.cmpf ugt, %arg0, %arg1 : f32
```
that will result in the range of the output also being uninitialized,
and so on for any consumer of the arith.cmpf result. When control-flow
ops are involved, the lack of propagation results in incorrect ranges,
as the back edges for loop carried values are not properly joined with
the definitions from the body region.
For example, an scf.while loop whose body region produces a value that
is in a dataflow relationship with some floating-point values through an
arith.cmpf operation:
```mlir
func.func @test_bad_range(%arg0: f32, %arg1: f32) -> (index, index) {
%c4 = arith.constant 4 : index
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%3 = arith.cmpf ugt, %arg0, %arg1 : f32
%1:2 = scf.while (%arg2 = %c0, %arg3 = %c0) : (index, index) -> (index, index) {
%2 = arith.cmpi ult, %arg2, %c4 : index
scf.condition(%2) %arg2, %arg3 : index, index
} do {
^bb0(%arg2: index, %arg3: index):
%4 = arith.select %3, %arg3, %arg3 : index
%5 = arith.addi %arg2, %c1 : index
scf.yield %5, %4 : index, index
}
return %1#0, %1#1 : index, index
}
```
The existing behavior results in the control condition %2 being
optimized to true, turning the while loop into an infinite loop. The
update to %arg2 through the body region is never factored into the range
calculation, as the ranges for the body ops all test as uninitialized.
This change causes all values initialized with setToEntryState to be set
to some initialized range, even if the values are not integers.
---------
Co-authored-by: Spenser Bauman <sabauma@fastmail>
These passes have been depreciated for a long time and replaced by
one-shot bufferization. These passes are also unsafe because they do not
check for read-after-write conflicts.
Relands https://github.com/llvm/llvm-project/pull/93488 which failed on
buildbot. Fixes the failure by updating integration tests to use
one-shot-bufferize instead.
This adds
- `mlir::tosa::populateTosaToLinalgTypeConversion` which converts
tensors of unsigned integers into tensors of signless integers
- modifies the `tosa.reshape` lowering in TosaToTensor to use the type
converter correctly
I choose to implement the type converter in
`mlir/Conversion/TosaToLinalg/TosaToLinalg.h` instead of
`mlir/Conversion/TosaToTensor/TosaToTensor.h` because I need the same
type converter in the TosaToLinalg lowerings (future PR).
Alternatively, I could duplicate the type converter so it exists both in
TosaToLinalg and TosaToTensor. Let me know if you prefer that.
This change updates the dataLayout string to ensure alignment with the
latest LLVM TargetMachine configuration. The aim is to
maintain consistency and prevent potential compilation issues related to
memory address space handling.
There was existing support for constant folding a `linalg.generic` that
was actually a transpose. This commit adds support for the named op,
`linalg.transpose`, as well by making use of the `LinalgOp` interface.
Building on top of
[#88204](https://github.com/llvm/llvm-project/pull/88204), this PR adds
support for converting `vector.insert` into an equivalent
`vector.shuffle` operation that operates on linearized (1-D) vectors.
The fortran arrays use 'dataLocation', 'rank', 'allocated' and
'associated' fields of the DICompositeType. These were not available in
'DICompositeTypeAttr'. This PR adds the missing fields.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
These passes have been depreciated for a long time and replaced by
one-shot bufferization. These passes are also unsafe because they do not
check for read-after-write conflicts.
This patch add more precise memory effect to linalg op. Including the
following points:
1. Remove the read side effects for operands that are not used.
2. Set the effect for all side effects to "full".
- Add `vector.interleave` to `spirv.VectorShuffle` conversion
- Remove the `vector.interleave` to `vector.shuffle` conversion from
`populateVectorToSPIRVPatterns` and CMake/Bazel dependencies
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
This is to make it more obvious for what the result type is, especially
with some less trivial cases like 0-d inputs resulting in 1-d inputs or
interaction with scalable vector types. Note that `vector.deinterleave`
uses the same format with explicit result type.
Also improve examples and clean up surrounding code.
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode
(`set_property(TARGET <target> PROPERTY FOLDER "<title>")`)
when using the respective CMake's IDE generator.
* Ensure that every target is in a folder
* Use a folder hierarchy with each LLVM subproject as a top-level folder
* Use consistent folder names between subprojects
* When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
This change expands the existing instrumentation that prints the IR
before/after each pass to an output stream (usually stderr). It adds
a new configuration that will print the output of each pass to a
separate file. The files will be organized into a directory tree
rooted at a specified directory. For existing tools, a CL option
`-mlir-print-ir-tree-dir` is added to specify this directory and
activate the new printing config.
The created directory tree mirrors the nesting structure of the IR. For
example,
if the IR is congruent to the pass-pipeline
"builtin.module(pass1,pass2,func.func(pass3,pass4),pass5)", and
`-mlir-print-ir-tree-dir=/tmp/pipeline_output`, then then the tree file
tree
created will look like:
```
/tmp/pass_output
├── builtin_module_the_symbol_name
│ ├── 0_pass1.mlir
│ ├── 1_pass2.mlir
│ ├── 2_pass5.mlir
│ ├── func_func_my_func_name
│ │ ├── 1_0_pass3.mlir
│ │ ├── 1_1_pass4.mlir
│ ├── func_func_my_other_func_name
│ │ ├── 1_0_pass3.mlir
│ │ ├── 1_1_pass4.mlir
```
The subdirectories are named by concatenating the relevant parent
operation names and symbol name (if present). The printer keeps a
counter associated with ops that are targeted by passes and their
isolated-from-above parents. Each filename is given a numeric prefix
using the counter value for the op that the pass is targeting and then
prepending the counter values for each parent. This gives a naming
where it is easy to distinguish which passes may have run concurrently
vs. which have a clear ordering. In the above example, for both
`1_1_pass4.mlir` files, the first `1` refers to the counter for the
parent op, and the second refers to the counter for the respective
function.
LLVM_HAS_NVPTX_TARGET is automatically set depending on whether NVPTX
was enabled when building LLVM. Use this instead of manually defining
MLIR_ENABLE_CUDA_CONVERSIONS (whose name is a bit misleading btw).