37 Commits

Author SHA1 Message Date
MLIR Team
5a91b9896c Remove "size" property of affine maps.
--

PiperOrigin-RevId: 250572818
2019-06-01 20:09:02 -07:00
Jacques Pienaar
c82e1da268 Remove unused function and avoid unused variable warning. NFC.
--

PiperOrigin-RevId: 247991231
2019-05-20 13:40:23 -07:00
River Riddle
d5b60ee840 Replace Operation::isa with llvm::isa.
--

PiperOrigin-RevId: 247789235
2019-05-20 13:37:52 -07:00
River Riddle
adca3c2edc Replace Operation::cast with llvm::cast.
--

PiperOrigin-RevId: 247785983
2019-05-20 13:37:42 -07:00
River Riddle
c5ecf9910a Add support for using llvm::dyn_cast/cast/isa for operation casts and replace usages of Operation::dyn_cast with llvm::dyn_cast.
--

PiperOrigin-RevId: 247780086
2019-05-20 13:37:31 -07:00
MLIR Team
41d90a85bd Automated rollback of changelist 247778391.
PiperOrigin-RevId: 247778691
2019-05-20 13:37:20 -07:00
River Riddle
02e03b9bf4 Add support for using llvm::dyn_cast/cast/isa for operation casts and replace usages of Operation::dyn_cast with llvm::dyn_cast.
--

PiperOrigin-RevId: 247778391
2019-05-20 13:37:10 -07:00
River Riddle
c4a5386e48 NFC: Replace usages of iterator_range<operand_iterator> with operand_range.
--

PiperOrigin-RevId: 242031201
2019-04-05 07:42:29 -07:00
Nicolas Vasilache
c9d5f3418a Cleanup SuperVectorization dialect printing and parsing.
On the read side,
```
%3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: (d0, d1, d2)->(d2, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
```

becomes:

```
%3 = vector_transfer_read %arg0[%i2, %i1, %i0] {permutation_map: (d0, d1, d2)->(d2, d0)} : memref<?x?x?xf32>, vector<32x256xf32>
```

On the write side,

```
vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
```

becomes

```
vector_transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>
```

Documentation will be cleaned up in a followup commit that also extracts a proper .md from the top of the file comments.

PiperOrigin-RevId: 241021879
2019-03-29 17:56:42 -07:00
Nicolas Vasilache
094ca64ab0 Refactor vectorization patterns
This CL removes the reliance of the vectorize pass on the specification of a `fastestVaryingDim` parameter. This parameter is a restriction meant to more easily target a particular loop/memref combination for vectorization and is mainly used for testing.

This also had the side-effect of restricting vectorization patterns to only the ones in which all memrefs were contiguous along the same loop dimension. This simple restriction prevented matmul to vectorize in 2-D.

this CL removes the restriction and adds the matmul test which vectorizes in 2-D along the parallel loops. Support for reduction loops is left for future work.

PiperOrigin-RevId: 240993827
2019-03-29 17:55:36 -07:00
River Riddle
9c08540690 Replace usages of Instruction with Operation in the /Analysis directory.
PiperOrigin-RevId: 240569775
2019-03-29 17:44:56 -07:00
River Riddle
9ffdc930c0 Rename the Instruction class to Operation. This just renames the class, usages of Instruction will still refer to a typedef in the interim.
This is step 1/N to renaming Instruction to Operation.

PiperOrigin-RevId: 240431520
2019-03-29 17:42:50 -07:00
River Riddle
96ebde9cfd Replace usages of "Op::operator->" with ".".
This is step 2/N of removing the temporary operator-> method as part of the de-const transition.

PiperOrigin-RevId: 240200792
2019-03-29 17:40:09 -07:00
River Riddle
af1abcc80b Replace usages of "operator->" with "." for the AffineOps.
Note: The "operator->" method is a temporary helper for the de-const transition and is gradually being phased out.
PiperOrigin-RevId: 240179439
2019-03-29 17:39:19 -07:00
Chris Lattner
986310a68f Remove const from Value, Instruction, Argument, and the various methods on the
*Op classes.  This is a net reduction by almost 400LOC.

PiperOrigin-RevId: 239972443
2019-03-29 17:34:33 -07:00
River Riddle
f37651c708 NFC. Move all of the remaining operations left in BuiltinOps to StandardOps. The only thing left in BuiltinOps are the core MLIR types. The standard types can't be moved because they are referenced within the IR directory, e.g. in things like Builder.
PiperOrigin-RevId: 236403665
2019-03-29 16:53:35 -07:00
Lei Zhang
85d9b6c8f7 Use consistent names for dialect op source files
This CL changes dialect op source files (.h, .cpp, .td) to follow the following
convention:

  <full-dialect-name>/<dialect-namespace>Ops.{h|cpp|td}

Builtin and standard dialects are specially treated, though. Both of them do
not have dialect namespace; the former is still named as BuiltinOps.* and the
latter is named as Ops.*.

Purely mechanical. NFC.

PiperOrigin-RevId: 236371358
2019-03-29 16:53:19 -07:00
River Riddle
870d778350 Begin the process of fully removing OperationInst. This patch cleans up references to OperationInst in the /include, /AffineOps, and lib/Analysis.
PiperOrigin-RevId: 232199262
2019-03-29 16:09:36 -07:00
River Riddle
de2d0dfbca Fold the functionality of OperationInst into Instruction. OperationInst still exists as a forward declaration and will be removed incrementally in a set of followup cleanup patches.
PiperOrigin-RevId: 232198540
2019-03-29 16:09:19 -07:00
River Riddle
5052bd8582 Define the AffineForOp and replace ForInst with it. This patch is largely mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body.
PiperOrigin-RevId: 232060516
2019-03-29 16:06:49 -07:00
River Riddle
36babbd781 Change the ForInst induction variable to be a block argument of the body instead of the ForInst itself. This is a necessary step in converting ForInst into an operation.
PiperOrigin-RevId: 231064139
2019-03-29 15:40:23 -07:00
River Riddle
6859f33292 Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int.
PiperOrigin-RevId: 230605756
2019-03-29 15:33:20 -07:00
Nicolas Vasilache
00aac70159 Move makeNormalizedAffineApply
This CL is the 3rd on the path to simplifying AffineMap composition.
This CL just moves `makeNormalizedAffineApply` from VectorAnalysis to
AffineAnalysis where it more naturally belongs.

PiperOrigin-RevId: 228277182
2019-03-29 15:04:38 -07:00
Nicolas Vasilache
c449e46ceb Introduce AffineExpr::compose(AffineMap)
This CL is the 1st on the path to simplifying AffineMap composition.
This CL uses the now accepted AffineExpr.replaceDimsAndSymbols to
implement `AffineExpr::compose(AffineMap)`.

Arguably, `simplifyAffineExpr` should be part of IR and not Analysis but
this CL does not yet pull the trigger on that.

PiperOrigin-RevId: 228265845
2019-03-29 15:03:36 -07:00
Nicolas Vasilache
7c0bbe0939 Iterate on vector rather than DenseMap during AffineMap normalization
This CL removes a flakyness associated to a spurious iteration on DenseMap
iterators when normalizing AffineMap.

PiperOrigin-RevId: 228160074
2019-03-29 14:59:37 -07:00
Nicolas Vasilache
62dabbfd09 Fix opt build failure
PiperOrigin-RevId: 227938032
2019-03-29 14:57:36 -07:00
Nicolas Vasilache
618c6a74c6 [MLIR] Introduce normalized single-result unbounded AffineApplyOp
Supervectorization does not plan on handling multi-result AffineMaps and
non-canonical chains of > 1 AffineApplyOp.
This CL introduces a simpler abstraction and composition of single-result
unbounded AffineApplyOp by using the existing unbound AffineMap composition.

This CL adds a simple API call and relevant tests:

```c++
OpPointer<AffineApplyOp> makeNormalizedAffineApply(
  FuncBuilder *b, Location loc, AffineMap map, ArrayRef<Value*> operands);
```

which creates a single-result unbounded AffineApplyOp.
The operands of AffineApplyOp are not themselves results of AffineApplyOp by
consrtuction.

This represent the simplest possible interface to complement the composition
of (mathematical) AffineMap, for the cases when we are interested in applying
it to Value*.

In this CL the composed AffineMap is not compressed (i.e. there exist operands
that are not part of the result). A followup commit will compress to normal
form.

The single-result unbounded AffineApplyOp abstraction will be used in a
followup CL to support the MaterializeVectors pass.

PiperOrigin-RevId: 227879021
2019-03-29 14:56:37 -07:00
Nicolas Vasilache
5b87a5ef4b [MLIR] Drop strict super-vector requirement in MaterializeVector
The strict requirement (i.e. at least 2 HW vectors in a super-vector) was a
premature optimization to avoid interfering with other vector code potentially
introduced via other means.

This CL avoids this premature optimization and the spurious errors it causes
when super-vector size == HW vector size (which is a possible corner case).

This may be revisited in the future.

PiperOrigin-RevId: 227763966
2019-03-29 14:54:49 -07:00
Chris Lattner
456ad6a8e0 Standardize naming of statements -> instructions, revisting the code base to be
consistent and moving the using declarations over.  Hopefully this is the last
truly massive patch in this refactoring.

This is step 21/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227178245
2019-03-29 14:44:30 -07:00
Chris Lattner
5187cfcf03 Merge Operation into OperationInst and standardize nomenclature around
OperationInst.  This is a big mechanical patch.

This is step 16/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227093712
2019-03-29 14:42:23 -07:00
Chris Lattner
3f190312f8 Merge SSAValue, CFGValue, and MLValue together into a single Value class, which
is the new base of the SSA value hierarchy.  This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate.  This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.

This is step 11/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227064624
2019-03-29 14:40:06 -07:00
Alex Zinenko
bc52a639f9 Extract vector_transfer_* Ops into a SuperVectorDialect.
From the beginning, vector_transfer_read and vector_transfer_write opreations
were intended as a mid-level vectorization abstraction.  In particular, they
are lowered to the StandardOps dialect before further processing.  As such, it
does not make sense to keep them at the same level as StandardOps.  Introduce
the new SuperVectorOps dialect and move vector_transfer_* operations there.
This will be used as a testbed for the generic lowering/legalization pass.

PiperOrigin-RevId: 225554492
2019-03-29 14:28:58 -07:00
Nicolas Vasilache
2408f0eba5 [MLIR] Drop assert for NYI in VectorAnalysis
This CLs adds proper error emission, removes NYI assertions and documents
assumptions that are required in the relevant functions.

PiperOrigin-RevId: 224377143
2019-03-29 14:21:22 -07:00
Nicolas Vasilache
df0a25efee [MLIR] Add support for permutation_map
This CL hooks up and uses permutation_map in vector_transfer ops.
In particular, when going into the nuts and bolts of the implementation, it
became clear that cases arose that required supporting broadcast semantics.
Broadcast semantics are thus added to the general permutation_map.
The verify methods and tests are updated accordingly.

Examples of interest include.

Example 1:
The following MLIR snippet:
```mlir
   for %i3 = 0 to %M {
     for %i4 = 0 to %N {
       for %i5 = 0 to %P {
         %a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32>
   }}}
```
may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into:
```mlir
   for %i3 = 0 to %0 step 32 {
     for %i4 = 0 to %1 {
       for %i5 = 0 to %2 step 256 {
         %4 = vector_transfer_read %arg0, %i4, %i5, %i3
              {permutation_map: (d0, d1, d2) -> (d2, d1)} :
              (memref<?x?x?xf32>, index, index) -> vector<32x256xf32>
   }}}
````
Meaning that vector_transfer_read will be responsible for reading the 2-D slice:
`%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will
require a transposition when vector_transfer_read is further lowered.

Example 2:
The following MLIR snippet:
```mlir
   %cst0 = constant 0 : index
   for %i0 = 0 to %M {
     %a0 = load %A[%cst0, %cst0] : memref<?x?xf32>
   }
```
may vectorize with {permutation_map: (d0) -> (0)} into:
```mlir
   for %i0 = 0 to %0 step 128 {
     %3 = vector_transfer_read %arg0, %c0_0, %c0_0
          {permutation_map: (d0, d1) -> (0)} :
          (memref<?x?xf32>, index, index) -> vector<128xf32>
   }
````
Meaning that vector_transfer_read will be responsible of reading the 0-D slice
`%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector
broadcast when vector_transfer_read is further lowered.

Additionally, some minor cleanups and refactorings are performed.

One notable thing missing here is the composition with a projection map during
materialization. This is because I could not find an AffineMap composition
that operates on AffineMap directly: everything related to composition seems
to require going through SSAValue and only operates on AffinMap at a distance
via AffineValueMap. I have raised this concern a bunch of times already, the
followup CL will actually do something about it.

In the meantime, the projection is hacked at a minimum to pass verification
and materialiation tests are temporarily incorrect.

PiperOrigin-RevId: 224376828
2019-03-29 14:20:07 -07:00
Nicolas Vasilache
b39d1f0bdb [MLIR] Add VectorTransferOps
This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.

VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.

VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.

Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.

VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.

A vector transfer read has semantics similar to a vector load, with additional
support for:
  1. an optional value of the elemental type of the MemRef. This value
     supports non-effecting padding and is inserted in places where the
     vector read exceeds the MemRef bounds. If the value is not specified,
     the access is statically guaranteed to be within bounds;
  2. an attribute of type AffineMap to specify a slice of the original
     MemRef access and its transposition into the super-vector shape. The
     permutation_map is an unbounded AffineMap that must represent a
     permutation from the MemRef dim space projected onto the vector dim
     space.

Example:
```mlir
  %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
  ...
  %val = `ssa-value` : f32
  // let %i, %j, %k, %l be ssa-values of type index
  %v0 = vector_transfer_read %src, %i, %j, %k, %l
        {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          (memref<?x?x?x?xf32>, index, index, index, index) ->
            vector<16x32x64xf32>
  %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
        {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          (memref<?x?x?x?xf32>, index, index, index, index, f32) ->
            vector<16x32x64xf32>
```

VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.

Example:
```mlir
  %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
  %val = `ssa-value` : vector<16x32x64xf32>
  // let %i, %j, %k, %l be ssa-values of type index
  vector_transfer_write %val, %src, %i, %j, %k, %l
    {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
  (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234
2019-03-29 14:15:25 -07:00
Nicolas Vasilache
5c16564bca [MLIR][Slicing] Add utils for computing slices.
This CL adds tooling for computing slices as an independent CL.
The first consumer of this analysis will be super-vector materialization in a
followup CL.

In particular, this adds:
1. a getForwardStaticSlice function with documentation, example and a
standalone unit test;
2. a getBackwardStaticSlice function with documentation, example and a
standalone unit test;
3. a getStaticSlice function with documentation, example and a standalone unit
test;
4. a topologicalSort function that is exercised through the getStaticSlice
unit test.

The getXXXStaticSlice functions take an additional root (resp. terminators)
parameter which acts as a boundary that the transitive propagation algorithm
is not allowed to cross.

PiperOrigin-RevId: 222446208
2019-03-29 14:08:02 -07:00
Nicolas Vasilache
89d9913a20 [MLIR][VectorAnalysis] Add a VectorAnalysis and standalone tests
This CL adds some vector support in prevision of the upcoming vector
materialization pass. In particular this CL adds 2 functions to:
1. compute the multiplicity of a subvector shape in a supervector shape;
2. help match operations on strict super-vectors. This is defined for a given
subvector shape as an operation that manipulates a vector type that is an
integral multiple of the subtype, with multiplicity at least 2.

This CL also adds a TestUtil pass where we can dump arbitrary testing of
functions and analysis that operate at a much smaller granularity than a pass
(e.g. an analysis for which it is convenient to write a bit of artificial MLIR
and write some custom test). This is in order to keep using Filecheck for
things that essentially look and feel like C++ unit tests.

PiperOrigin-RevId: 222250910
2019-03-29 14:02:17 -07:00