This is a test library which is not part of libMLIR, so it should
use normal LINK_LIBS instead of mlir_target_link_libraries.
This fixes an issue introduced in #123910 and follows up on the
fix in #125004, which added the library to DEPENDS, which is not
sufficient.
This cherry picks
[mlir] Fix build race condition in Pass Manager tests (d906da5ead2764579395e5006c517f2ec9afd46f)
to the 20.x release branch.
This addresses issues that started with
https://github.com/llvm/llvm-project/pull/123910, which is already on the 20.x branch.
Linaro noticed this on our flang dylib (shared library) build bot.
In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/TestPassManager.cpp:10:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Pass/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
148 | #include "TestOps.h.inc"
| ^~~~~~~~~~~~~~~
We have tested these changes on the buildbot for the last 2 days and had no problems.
Whereas before it was failing maybe 1 in 10 builds, enough that multiple people
in the community noticed it.
Reported in https://github.com/llvm/llvm-project/issues/124485.
Another follow up fix to
https://github.com/llvm/llvm-project/pull/123910 to fix a build failure
that sometimes happens in shared library builds:
https://lab.llvm.org/buildbot/#/builders/50/builds/9724
In file included from
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Transforms/TestInlining.cpp:16:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Transforms/../Dialect/Test/TestOps.h:148:10:
fatal error: 'TestOps.h.inc' file not found
148 | #include "TestOps.h.inc"
| ^~~~~~~~~~~~~~~
1 error generated.
(cherry picked from commit ebd23f25c8936db3dd917567737a067d6878e2f4)
Following up on #122188, this PR adds support for poison indices to
`ExtractOp` and `InsertOp`. It also includes canonicalization patterns
to turn extract/insert ops with poison indices into `ub.poison`.
Model the `IndexType` as `uint64_t` when converting to a python integer.
With the python bindings,
```python
DenseIntElementsAttr(op.attributes["attr"])
```
used to `assert` when `attr` had `index` type like `dense<[1, 2, 3, 4]>
: vector<4xindex>`.
---------
Co-authored-by: Christopher McGirr <christopher.mcgirr@amd.com>
Co-authored-by: Tiago Trevisan Jost <tiago.trevisanjost@amd.com>
The TOSA-v1.0 specification makes the shift attribute of the MUL
(Hammard product) operator an input. Move the `shift` parameter of the
MUL operator in the MILR TOSA dialect from an attribute to an input and
update any lit tests appropriately.
Expand the verifier of the `tosa::MulOp` operation to check the various
constraints defined in the TOSA-v1.0 specification. Specifically, ensure
that all input operands (excluding the optional shift) are of the same
rank. This means that broadcasting tests which previously checked rank-0
tensors would be broadcast are no longer valid and are removed.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
Co-authored-by: TatWai Chong <tatwai.chong@arm.com>
On lowering from `memref` to LLVM, `malloc` and other intrinsic
functions from `libc` will be declared in the current module. User's
redefinition of these reserved functions will poison the internal
analysis with wrong prototype. This patch adds assertion on the found
function's type and reports if it mismatch with the intended type.
Related to #120950
---------
Co-authored-by: Luohao Wang <Luohaothu@users.noreply.github.com>
Summary:
The previous offloading entry type did not fit the current use-cases
very well. This widens it and adds a version to prevent further
annoyances. It also includes the kind to better sort who's using it.
The first 64-bytes are reserved as zero so the OpenMP runtime can detect
the old format for binary compatibilitry.
Fixes https://github.com/llvm/llvm-project/issues/124578
Handles the `nowait` clause for `omp.target` ops when the actual target
is the host (i.e. there is no target device). Rather than only checking
for the `HasNoWait` boolean, we also check for the presence/absence of a
`DeviceID` value. We only emit the target task if both are present.
See
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792
for detailed introduction.
This PR acts as the first part of it
* Add `OpAsmTypeInterface` and `getAsmName` API for deducing ASM name
from type
* Add default impl in `OpAsmOpInterface` to respect this API when
available.
The `OpAsmAttrInterface` / hooking into Alias system part should be
another PR, using a `getAlias` API.
### Discussion
* Instead of using `StringRef getAsmName()` as the API, I use `void
getAsmName(OpAsmSetNameFn)`, as returning StringRef might be unsafe
(std::string constructed inside then returned a _ref_; and this aligns
with the design of `getAsmResultNames`.
* On the result packing of an op, the current approach is that when not
all of the result types are `OpAsmTypeInterface`, then do nothing (old
default impl)
### Review
Cc @j2kun and @Alexanderviand-intel for downstream; Cc @River707 and
@joker-eph for relevent commit history; Cc @ftynse for discourse.
This PR implements a generalization of the existing more efficient
lowering of shape casts from 2-D to 1D and 1-D to 2-D vectors. This
significantly reduces code size and generates more performant code for
n-D shape casts that make their way to LLVM/SPIR-V.
- Bail out of TableGen if any asserts fail before running the backend.
- Add asserts to validate that the `Objects` and `Modes` lists for
various `HwModeSelect` subclasses are of same length.
- Eliminate equivalent check in CodeGenHWModes.cpp
The recipe operations should have AutomaticAllocationScope so recipes can
be converted using operators that require parent ops to have
AutomaticAllocationScope
The op carries the output-shape directly. This can be used directly.
Also adds a method to get the shape as a `SmallVector<OpFoldResult>`.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
The index computation is meant to be signed. Using unsigned could lead
to subtle errors. Fix places where some index math was using unsigned
operations.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
# Overview
This PR adds 2:4 structured sparsity (sparse A, dense B) matrix multiply
instructions to ROCDL.
# Testing
I've added tests to Dialect/mlir and Target/mlir
This was discussed during the original review but I made it stricter
than discussed. Making it a pure view but adding a helper for bytecode
serialization (I could avoid the helper, but it ends up with more logic
and stronger coupling).
Previously, an indirect call was incorrectly generated when
`llvm::CallBase::getCalledFunction` returned null due to a type mismatch
between the call and the function. This patch updates the code to use
`llvm::CallBase::getCalledOperand` instead.
`BreakDownVectorBitCast` leverages
* `vector.extract_strided_slices` + `vector.insert_strided_slices`
As these Ops do not support extracting scalable sub-vectors (i.e.
extracting/inserting a fraction of a scalable dim), it's best to bail
out.
This logic is in the critical path for constructing an operation from
Python. It is faster to compute this in C++ than it is in Python, and it
is a minor change to do this.
This change also alters the API contract of
_ods_common.get_op_results_or_values to avoid calling
get_op_result_or_value on each element of a sequence, since the C++ code
will now do this.
Most of the diff here is simply reordering the code in IRCore.cpp.
With the removal of mlir-vulkan-runner (as part of #73457) in
e7e3c45bc70904e24e2b3221ac8521e67eb84668, mlir-cpu-runner is now the
only runner for all CPU and GPU targets, and the "cpu" name has been
misleading for some time already. This commit renames it to mlir-runner.
The convertFloorOp pattern incurs precision loss when floating-point
numbers exceed the representable range of int64. This pattern should be
removed.
Fixes https://github.com/llvm/llvm-project/issues/119836
This PR adds mfma.scale.f32.32x32x64.f8f6f4 and
mfma.scale.f32.16x16x128.f8f6f4 to the ROCDL dialect. They are converted
to the corresponding intrinsics in the mlir-to-llvmir pass.
Intrinsics are available for the 'cpSize'
variants also. So, this patch migrates the Op
to lower to the intrinsics for all cases.
* Update the existing tests to check the lowering to intrinsics.
* Add newer cp_async_zfill tests to verify the lowering for the 'cpSize'
variants.
* Tidy-up CHECK lines in cp_async() function in nvvmir.mlir (NFC)
PTX spec link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
The TOSA-V1.0 specification adds "nan propagation" modes as attributes
for several operators. Adjust the ODS definitions of the relevant
operations to include this attribute.
The defined modes are "PROPAGATE" and "IGNORE" and the PROPAGATE mode is
set by default.
MAXIMUM, MINIMUM, REDUCE_MAX, REDUCE_MIN, MAX_POOL, CLAMP, and ARGMAX
support this attribute.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
Co-authored-by: TatWai Chong <tatwai.chong@arm.com>
This patch changes PadOp's padding input to type !tosa.shape<2 * rank>,
(where rank is the rank of the PadOp's input), instead of a <rank x 2>
tensor.
This patch is also a part of TOSA v1.0 effort:
https://discourse.llvm.org/t/rfc-tosa-dialect-increment-to-v1-0/83708
This patch updates the PadOp to match all against the TOSA v1.0 form.
Original Authors include:
@Tai78641
@wonjeon
Co-authored-by: Tai Ly <tai.ly@arm.com>
Scan directive allows to specify scan reductions within an worksharing
loop, worksharing loop simd or simd directive which should have an
`InScan` modifier associated with it. This change adds the mlir support
for the same.
Related PR: [Parsing and Semantic Support for
scan](https://github.com/llvm/llvm-project/pull/102792)
This is hopefully the first patch in the series of patches adding some
missing SPIR-V ops to MLIR over the next weeks/months, starting with
something simple: `OpEmitVertex` and `OpEndPrimitive`. Since the ops
have no input and outputs, and the only condition is "This instruction
must only be used when only one stream is present.", which I don't think
can be validate at the instruction level in isolation, I set
`hasVerifier` to 0. I hope I didn't miss anything, but I'm more than
happy to address any comments.
…the distributed IR case.
This patch allows `nd_load` and `nd_store` to preserve the tensor
descriptor shape during distribution to SIMT. The validation now expects
the distributed instruction to retain the `sg_map` attribute and uses it
to verify the consistency.
Previously ODS-generated Python operations had code like this:
```
super().__init__(self.build_generic(attributes=attributes, operands=operands, successors=_ods_successors, regions=regions, loc=loc, ip=ip))
```
we change it to:
```
super().__init__(self.OPERATION_NAME, self._ODS_REGIONS, self._ODS_OPERAND_SEGMENTS, self._ODS_RESULT_SEGMENTS, attributes=attributes, operands=operands, successors=_ods_successors, regions=regions, loc=loc, ip=ip)
```
This:
a) avoids an extra call dispatch (to `build_generic`), and
b) passes the class attributes directly to the constructor. Benchmarks
show that it is faster to pass these as arguments rather than having the
C++ code look up attributes on the class.
This PR improves the timing of the following benchmark on my workstation
from 5.3s to 4.5s:
```
def main(_):
with ir.Context(), ir.Location.unknown():
typ = ir.IntegerType.get_signless(32)
m = ir.Module.create()
with ir.InsertionPoint(m.body):
start = time.time()
for i in range(1000000):
arith.ConstantOp(typ, i)
end = time.time()
print(f"time: {end - start}")
```
Since this change adds an additional overload to the constructor and
does not alter any existing behaviors, it should be backwards
compatible.
This fixes the infer output shape of TOSA slice op for start/size values
that are out-of-bound or -1
added tests to check:
- size = -1
- size is out of bound
- start is out of bound
Signed-off-by: Tai Ly <tai.ly@arm.com>
[note: this is blocked by:
https://github.com/tensorflow/tensorflow/pull/73891 otherwise tensorflow
may have lit test failures]
This patch adds SameOperandsAndResultRank trait to TOSA operators with
ResultsBroadcastableShape trait. SameOperandsAndResultRank trait
requiring that all operands and results have matching ranks unless the
operand/result is unranked.
This also renders the TosaMakeBroadcastable pass unnecessary - but this
pass is left in for now just in case it is still used in some flows. The
lit test, broadcast.mlir, is removed.
This also adds verify of the SameOperandsAndResultRank trait in the
TosaInferShapes pass to validate inferred shapes.
Signed-off-by: Tai Ly <tai.ly@arm.com>
0-d vectors are supported now and so these patterns are no longer
required. This covers a part of this issue
https://github.com/llvm/llvm-project/issues/112913 . Additionally this
removes %arg2 in mlir/test/Conversion/GPUCommon/transfer_write.mlir and
renames %arg3 to %arg2 as %arg2 was originally not required.
Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.
I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
[v3 with more `-DBUILD_SHARED_LIBS=ON` fixes]
Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.
I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
[v2 with fixed `-DBUILD_SHARED_LIBS=ON` build]