canonicalizations of operations. The ultimate important user of this is
going to be a funcBuilder->foldOrCreate<YourOp>(...) API, but for now it
is just a more convenient way to write certain classes of canonicalizations
(see the change in StandardOps.cpp).
NFC.
PiperOrigin-RevId: 230770021
The constant folding rules assumes value attributes of operands are already
verified to be in good standing.
For each op in the above, the constant folding rules support both integer and
floating point cases. Broadcast behavior is also supported as per the semantics
of TFLite ops.
This CL does not handle overflow/underflow cases yet.
PiperOrigin-RevId: 229441221
The const folding logic is structurally similar, so use a template
to abstract the common part.
Moved mul(x, 0) to a legalization pattern to be consistent with
mul(x, 1).
Also promoted getZeroAttr() to be a method on Builder since it is
expected to be frequently used.
PiperOrigin-RevId: 228891989
Integer comparisons can be constant folded if both of their arguments are known
constants, which we can compare in the compiler. This requires implementing
all comparison predicates, but thanks to consistency between LLVM and MLIR
comparison predicates, we have a one-to-one correspondence between predicates
and llvm::APInt comparison functions. Constant folding of comparsions with
maximum/minimum values of the integer type are left for future work.
This will be used to test the lowering of mod/floordiv/ceildiv in affine
expressions at compile time.
PiperOrigin-RevId: 228077580
This adds signed/unsigned integer division and remainder operations to the
StandardOps dialect. Two versions are required because MLIR integers are
signless, but the meaning of the leading bit is important in division and
affects the results. LLVM IR made a similar choice. Define the operations in
the tablegen file and add simple constant folding hooks in the C++
implementation. Handle signed division overflow and division by zero errors in
constant folding. Canonicalization is left for future work.
These operations are necessary to lower affine_apply's down to LLVM IR.
PiperOrigin-RevId: 228077549
Use tablegen to generate definitions of the standard binary arithmetic
operations. These operations share a lot of boilerplate that is better off
generated by a tool.
Using tablegen for standard binary arithmetic operations requires the following
modifications.
1. Add a bit field `hasConstantFolder` to the base Op tablegen class; generate
the `constantFold` method signature if the bit is set. Differentiate between
single-result and zero/multi-result functions that use different signatures.
The implementation of the method remains in C++, similarly to canonicalization
patterns, since it may be large and non-trivial.
2. Define the `AnyType` record of class `Type` since `BinaryOp` currently
provided in op_base.td is supposed to operate on tensors and other tablegen
users may rely on this behavior.
Note that this drops the inline documentation on the operation classes that was
copy-pasted around anyway. Since we don't generate g3doc from tablegen yet,
keep LangRef.md as it is. Eventually, the user documentation can move to the
tablegen definition file as well.
PiperOrigin-RevId: 227820815
Moving forward dialect namespaces cannot contain '.' characters.
This cl also standardizes that operation names must begin with the dialect namespace followed by a '.'.
PiperOrigin-RevId: 227532193
StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names.
This is step 17/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227121537
is the new base of the SSA value hierarchy. This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.
This is step 11/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227064624
From the beginning, vector_transfer_read and vector_transfer_write opreations
were intended as a mid-level vectorization abstraction. In particular, they
are lowered to the StandardOps dialect before further processing. As such, it
does not make sense to keep them at the same level as StandardOps. Introduce
the new SuperVectorOps dialect and move vector_transfer_* operations there.
This will be used as a testbed for the generic lowering/legalization pass.
PiperOrigin-RevId: 225554492
- loop step wasn't handled and there wasn't a TODO or an assertion; fix this.
- rename 'delay' to shift for consistency/readability.
- other readability changes.
- remove duplicate attribute print for DmaStartOp; fix misplaced attribute
print for DmaWaitOp
- add build method for AddFOp (unrelated to this CL, but add it anyway)
PiperOrigin-RevId: 224892958
This simplifies call-sites returning true after emitting an error. After the
conversion, dropped braces around single statement blocks as that seems more
common.
Also, switched to emitError method instead of emitting Error kind using the
emitDiagnostic method.
TESTED with existing unit tests
PiperOrigin-RevId: 224527868
This CL adds the following free functions:
```
/// Returns the AffineExpr e o m.
AffineExpr compose(AffineExpr e, AffineMap m);
/// Returns the AffineExpr f o g.
AffineMap compose(AffineMap f, AffineMap g);
```
This addresses the issue that AffineMap composition is only available at a
distance via AffineValueMap and is thus unusable on Attributes.
This CL thus implements AffineMap composition in a more modular and composable
way.
This CL does not claim that it can be a good replacement for the
implementation in AffineValueMap, in particular it does not support bounded
maps atm.
Standalone tests are added that replicate some of the logic of the AffineMap
composition pass.
Lastly, affine map composition is used properly inside MaterializeVectors and
a standalone test is added that requires permutation_map composition with a
projection map.
PiperOrigin-RevId: 224376870
This CL hooks up and uses permutation_map in vector_transfer ops.
In particular, when going into the nuts and bolts of the implementation, it
became clear that cases arose that required supporting broadcast semantics.
Broadcast semantics are thus added to the general permutation_map.
The verify methods and tests are updated accordingly.
Examples of interest include.
Example 1:
The following MLIR snippet:
```mlir
for %i3 = 0 to %M {
for %i4 = 0 to %N {
for %i5 = 0 to %P {
%a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32>
}}}
```
may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into:
```mlir
for %i3 = 0 to %0 step 32 {
for %i4 = 0 to %1 {
for %i5 = 0 to %2 step 256 {
%4 = vector_transfer_read %arg0, %i4, %i5, %i3
{permutation_map: (d0, d1, d2) -> (d2, d1)} :
(memref<?x?x?xf32>, index, index) -> vector<32x256xf32>
}}}
````
Meaning that vector_transfer_read will be responsible for reading the 2-D slice:
`%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will
require a transposition when vector_transfer_read is further lowered.
Example 2:
The following MLIR snippet:
```mlir
%cst0 = constant 0 : index
for %i0 = 0 to %M {
%a0 = load %A[%cst0, %cst0] : memref<?x?xf32>
}
```
may vectorize with {permutation_map: (d0) -> (0)} into:
```mlir
for %i0 = 0 to %0 step 128 {
%3 = vector_transfer_read %arg0, %c0_0, %c0_0
{permutation_map: (d0, d1) -> (0)} :
(memref<?x?xf32>, index, index) -> vector<128xf32>
}
````
Meaning that vector_transfer_read will be responsible of reading the 0-D slice
`%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector
broadcast when vector_transfer_read is further lowered.
Additionally, some minor cleanups and refactorings are performed.
One notable thing missing here is the composition with a projection map during
materialization. This is because I could not find an AffineMap composition
that operates on AffineMap directly: everything related to composition seems
to require going through SSAValue and only operates on AffinMap at a distance
via AffineValueMap. I have raised this concern a bunch of times already, the
followup CL will actually do something about it.
In the meantime, the projection is hacked at a minimum to pass verification
and materialiation tests are temporarily incorrect.
PiperOrigin-RevId: 224376828
- add optional stride arguments for DmaStartOp
- add DmaStartOp::verify(), and missing test cases for DMA op's in
test/IR/memory-ops.mlir.
PiperOrigin-RevId: 224232466
The checks for `isa<IndexType>() || isa<IntegerType>()` and
`isa<IndexType>() || isa<IntegerType>() || isa<FloatType>()`
are frequently used, so it's useful to have some helper
methods for them.
PiperOrigin-RevId: 224133596
This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.
VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.
VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.
Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.
VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.
A vector transfer read has semantics similar to a vector load, with additional
support for:
1. an optional value of the elemental type of the MemRef. This value
supports non-effecting padding and is inserted in places where the
vector read exceeds the MemRef bounds. If the value is not specified,
the access is statically guaranteed to be within bounds;
2. an attribute of type AffineMap to specify a slice of the original
MemRef access and its transposition into the super-vector shape. The
permutation_map is an unbounded AffineMap that must represent a
permutation from the MemRef dim space projected onto the vector dim
space.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
...
%val = `ssa-value` : f32
// let %i, %j, %k, %l be ssa-values of type index
%v0 = vector_transfer_read %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index) ->
vector<16x32x64xf32>
%v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index, f32) ->
vector<16x32x64xf32>
```
VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
%val = `ssa-value` : vector<16x32x64xf32>
// let %i, %j, %k, %l be ssa-values of type index
vector_transfer_write %val, %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234
class. This change is NFC, but allows for new kinds of patterns, specifically
LegalizationPatterns which will be allowed to change the types of things they
rewrite.
PiperOrigin-RevId: 223243783
This CL added two new traits, SameOperandsAndResultShape and
ResultsAreBoolLike, and changed CmpIOp to embody these two
traits. As a consequence, CmpIOp's result type now is verified
to be bool-like.
PiperOrigin-RevId: 223208438
The semantics of 'select' is conventional: return the second operand if the
first operand is true (1 : i1) and the third operand otherwise. It is
applicable to vectors and tensors element-wise, similarly to LLVM instruction.
This operation is necessary to implement min/max to lower 'for' loops with
complex bounds to CFG functions and to support ternary operations in ML
functions. It is preferred to first-class min/max because of its simplicity,
e.g. it is not concered with signedness.
PiperOrigin-RevId: 223160860
and getMemRefRegion() to work with specified loop depths; add support for
outgoing DMAs, store op's.
- add support for getMemRefRegion symbolic in outer loops - hence support for
DMAs symbolic in outer surrounding loops.
- add DMA generation support for outgoing DMAs (store op's to lower memory
space); extend getMemoryRegion to store op's. -memref-bound-check now works
with store op's as well.
- fix dma-generate (references to the old memref in the dma_start op were also
being replaced with the new buffer); we need replace all memref uses to work
only on a subset of the uses - add a new optional argument for
replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional
'operation' argument to serve as a filter - if provided, only those uses that
are dominated by the filter are replaced.
- Add missing print for attributes for dma_start, dma_wait op's.
- update the FlatAffineConstraints API
PiperOrigin-RevId: 221889223
* Optionally attach the type of integer and floating point attributes to the attributes, this allows restricting a int/float to specific width.
- Currently this allows suffixing int/float constant with type [this might be revised in future].
- Default to i64 and f32 if not specified.
* For index types the APInt width used is 64.
* Change callers to request a specific attribute type.
* Store iN type with APInt of width N.
* This change does not handle the folding of constants of different types (e.g., doing int type promotions to support constant folding i3 and i32), and instead restricts the constant folding to only operate on the same types.
PiperOrigin-RevId: 221722699
Change the storage type to APInt from int64_t for IntegerAttr (following the change to APFloat storage in FloatAttr). Effectively a direct change from int64_t to 64-bit APInt throughout (the bitwidth hardcoded). This change also adds a getInt convenience method to IntegerAttr and replaces previous getValue calls with getInt calls.
While this changes updates the storage type, it does not update all constant folding calls.
PiperOrigin-RevId: 221082788
time. The "Fast and Flexible Instruction Selection With Constraints" paper
from CC2018 makes a credible argument that dynamic costs aren't actually
necessary/important, and we are not using them.
- Check in my "MLIR Generic DAG Rewriter Infrastructure" design doc into the
source tree.
PiperOrigin-RevId: 221017546
- constant bounded memory regions, static shapes, no handling of
overlapping/duplicate regions (through union) for now; also only, load memory
op's.
- add build methods for DmaStartOp, DmaWaitOp.
- move getMemoryRegion() into Analysis/Utils and expose it.
- fix addIndexSet, getMemoryRegion() post switch to exclusive upper bounds;
update test cases for memref-bound-check and memref-dependence-check for
exclusive bounds (missed in a previous CL)
PiperOrigin-RevId: 220729810
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
This binary operation is applicable to integers, vectors and tensors thereof
similarly to binary arithmetic operations. The operand types must match
exactly, and the shape of the result type is the same as that of the operands.
The element type of the result is always i1. The kind of the comparison is
defined by the "predicate" integer attribute. This attribute requests one of:
- equals to;
- not equals to;
- signed less than;
- signed less than or equals;
- signed greater than;
- signed greater than or equals;
- unsigned less than;
- unsigned less than or equals;
- unsigned greater than;
- unsigned greater than or equals.
Since integer values themselves do not have a sign, the comparison operator
specifies whether to use signed or unsigned comparison logic, i.e. whether to
interpret values where the foremost bit is set as negatives expressed as two's
complements or as positive values. For non-scalar operands, pairwise
per-element comparison is performed. Comparison operators on scalars are
necessary to implement basic control flow with conditional branches.
PiperOrigin-RevId: 220613566
This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast.
PiperOrigin-RevId: 219372163
- We already have Pattern::match(). Using mlir::match() in Pattern::match()
confuses the compiler.
- Created m_Op() to avoid using detailed matcher implementation in
StandardOps.cpp.
PiperOrigin-RevId: 219328823
- There are several places where we are casting the type of the memref obtained
from the load/store op to a memref type, and this will become even more
common (some upcoming CLs this week). Add a getMemRefType and use it at
several places where the cast was being used.
PiperOrigin-RevId: 219164326
- Added a mechanism for specifying pattern matching more concisely like LLVM.
- Added support for canonicalization of addi/muli over vector/tensor splat
- Added ValueType to Attribute class hierarchy
- Allowed creating constant splat
PiperOrigin-RevId: 219149621