mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-19 07:46:49 +00:00
[mlir][NFC] Improve bufferization documentation (#89495)
* Add example for `test-analysis-only` and `print-conflicts`. * Mention other bufferization-related passes. * Update outdated documentation.
This commit is contained in:
parent
30cfe2b2ac
commit
099417d617
@ -5,35 +5,45 @@
|
||||
## Overview
|
||||
|
||||
Bufferization in MLIR is the process of converting ops with `tensor` semantics
|
||||
to ops with `memref` semantics. MLIR provides an infrastructure that bufferizes
|
||||
an entire program in a single pass (*One-Shot Bufferize*). This infrastructure
|
||||
bufferizes all ops that implement the
|
||||
[`BufferizableOpInterface`](https://github.com/llvm/llvm-project/blob/17a68065c378da74805e4e1b9a5b78cc9f83e580/mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td)
|
||||
can be bufferized.
|
||||
to ops with `memref` semantics. There are multiple MLIR passes that are related
|
||||
to bufferization. These passes typically run as one of the last steps in a
|
||||
pass pipeline, right before lowering to `memref` ops to LLVM. That is because
|
||||
many transformations are easier or only supported in tensor land; e.g.,
|
||||
[tile/fuse/… on tensors first](https://llvm.discourse.group/t/rfc-linalg-on-tensors-update-and-comprehensive-bufferization-rfc/3373),
|
||||
then bufferize the remaining IR.
|
||||
|
||||
MLIR has an older bufferization infrastructure built around
|
||||
[dialect conversion](DialectConversion.md). Most dialect conversion
|
||||
bufferization patterns have been migrated to One-Shot Bufferize, but some
|
||||
functionality such as function boundary bufferization still depends on dialect
|
||||
conversion and its type converter. New projects should use One-Shot Bufferize,
|
||||
as the dialect conversion-based bufferization will eventually be deprecated.
|
||||
Moreover, One-Shot Bufferize results in better bufferization with fewer memory
|
||||
allocations and buffer copies. This documentation is mostly about One-Shot
|
||||
Bufferize, but also describes how to gradually migrate a project from dialect
|
||||
conversion-based bufferization to One-Shot Bufferize.
|
||||

|
||||
|
||||
The most important bufferization pass is *One-Shot Bufferize*: This pass
|
||||
rewrites `tensor` IR to `memref` IR. There are additional helper passes that
|
||||
preprocess IR (e.g., so that IR can be bufferized more efficiently), perform
|
||||
buffer-level optimizations such as allocation hoisting, and
|
||||
[insert buffer deallocation ops](OwnershipBasedBufferDeallocation.md) so that
|
||||
the resulting `memref` IR has no memory leaks.
|
||||
|
||||
## Deprecated Passes
|
||||
|
||||
The old dialect conversion-based bufferization passes have been deprecated and
|
||||
should not be used anymore. Most of those passes have already been removed from
|
||||
MLIR. One-Shot Bufferize produces in better bufferization results with fewer
|
||||
memory allocations and buffer copies.
|
||||
|
||||
The buffer deallocation pass has been deprecated in favor of the ownership-based
|
||||
buffer deallocation pipeline. The deprecated pass has some limitations that may
|
||||
cause memory leaks in the resulting IR.
|
||||
|
||||
## What is One-Shot Bufferize?
|
||||
|
||||
One-Shot Bufferize is a new tensor bufferization pass designed for IR in
|
||||
One-Shot Bufferize is a tensor bufferization pass designed for IR in
|
||||
[destination-passing style](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/dps-fhpc17.pdf),
|
||||
and with aggressive in-place bufferization.
|
||||
|
||||
One-Shot Bufferize is:
|
||||
|
||||
* **Monolithic**: A single MLIR pass does the entire work, whereas the
|
||||
previous bufferization in MLIR was split across multiple passes residing in
|
||||
different dialects. In One-Shot Bufferize, `BufferizableOpInterface`
|
||||
implementations are spread across different dialects.
|
||||
* **Monolithic**: A single MLIR pass does the entire work.
|
||||
|
||||
* **Extensible** via an op interface: All ops that implement
|
||||
`BufferizableOpInterface` can be bufferized.
|
||||
|
||||
* A **whole-function at a time analysis**. In-place bufferization decisions
|
||||
are made by analyzing SSA use-def chains on tensors. Op interface
|
||||
@ -41,10 +51,7 @@ One-Shot Bufferize is:
|
||||
ops, but also helper methods for One-Shot Bufferize's analysis to query
|
||||
information about an op's bufferization/memory semantics.
|
||||
|
||||
* **Extensible** via an op interface: All ops that implement
|
||||
`BufferizableOpInterface` can be bufferized.
|
||||
|
||||
* **2-Pass**: Bufferization is internally broken down into 2 steps: First,
|
||||
* **2-Phase**: Bufferization is internally broken down into 2 steps: First,
|
||||
analyze the entire IR and make bufferization decisions. Then, bufferize
|
||||
(rewrite) the IR. The analysis has access to exact SSA use-def information.
|
||||
It incrementally builds alias and equivalence sets and does not rely on a
|
||||
@ -60,27 +67,17 @@ One-Shot Bufferize is:
|
||||
of `AnalysisState` that implements a small number virtual functions can
|
||||
serve as a custom analysis. It is even possible to run One-Shot Bufferize
|
||||
without any analysis (`AlwaysCopyAnalysisState`), in which case One-Shot
|
||||
Bufferize behaves exactly like the old dialect conversion-based
|
||||
bufferization (i.e., copy every buffer before writing to it).
|
||||
Bufferize copies every buffer before writing to it.
|
||||
|
||||
To reduce complexity, One-Shot Bufferize should be
|
||||
[run after other transformations](https://llvm.discourse.group/t/rfc-linalg-on-tensors-update-and-comprehensive-bufferization-rfc/3373),
|
||||
typically as one of the last steps right before lowering memref ops. Many
|
||||
transformations are easier in tensor land; e.g., tile/fuse/… on tensors first,
|
||||
then bufferize the remaining IR.
|
||||
|
||||
From an architecture perspective, One-Shot Bufferize consists of
|
||||
[BufferizableOpInterface](https://github.com/llvm/llvm-project/blob/17a68065c378da74805e4e1b9a5b78cc9f83e580/mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td)
|
||||
(and its implementations) and an
|
||||
[analysis](https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h#L164)
|
||||
of tensor SSA values that decides if a buffer can be used directly or must be
|
||||
copied. The [bufferize] method of the op interface inspects analysis results and
|
||||
rewrites tensor ops into memref ops.
|
||||
Note that One-Shot Bufferize does not deallocate buffers. That is done by the
|
||||
[Ownership-based Buffer Deallocation passes](OwnershipBasedBufferDeallocation.md).
|
||||
|
||||
## Goals of Bufferization
|
||||
|
||||
The high-level goal of every bufferization technique is to: 1. Use as little
|
||||
memory as possible. 2. Copy as little memory as possible.
|
||||
The high-level goal of every bufferization technique is to:
|
||||
|
||||
1. Use as little memory as possible.
|
||||
2. Copy as little memory as possible.
|
||||
|
||||
This implies reusing already allocated buffers when possible, turning
|
||||
bufferization into an algorithmically complex problem with similarities to
|
||||
@ -102,40 +99,46 @@ choosing an already existing buffer, we must be careful not to accidentally
|
||||
overwrite data that is still needed later in the program.
|
||||
|
||||
To simplify this problem, One-Shot Bufferize was designed to take advantage of
|
||||
*destination-passing style*. This form exists in itself independently of
|
||||
bufferization and is tied to SSA semantics: many ops are “updating” part of
|
||||
their input SSA variable. For example the LLVM instruction
|
||||
*destination-passing style* (DPS). In MLIR, DPS op should implement the
|
||||
[`DestinationStyleOpInterface`](https://github.com/llvm/llvm-project/blob/792d437b56adfb3416daf8105942d4899fb82763/mlir/include/mlir/Interfaces/DestinationStyleOpInterface.td).
|
||||
DPS exists in itself independently of bufferization and is tied to SSA
|
||||
semantics: many ops are "updating" a part of their input SSA variables. For
|
||||
example the LLVM instruction
|
||||
[`insertelement`](https://llvm.org/docs/LangRef.html#insertelement-instruction)
|
||||
is inserting an element inside a vector. Since SSA values are immutable, the
|
||||
operation returns a copy of the input vector with the element inserted.
|
||||
Another example in MLIR is `linalg.generic`, which always has an extra `outs`
|
||||
operand which provides the initial values to update (for example when the
|
||||
operation is doing a reduction).
|
||||
Another example in MLIR is `linalg.generic` on tensors, which always has an
|
||||
extra `outs` operand for each result, which provides the initial values to
|
||||
update (for example when the operation is doing a reduction).
|
||||
|
||||
This input is referred to as "destination" in the following (quotes are
|
||||
`outs` operands are referred to as "destinations" in the following (quotes are
|
||||
important as this operand isn't modified in place but copied) and comes into
|
||||
place in the context of bufferization as a possible "anchor" for the
|
||||
bufferization algorithm. This allows the user to shape the input in a form that
|
||||
guarantees close to optimal bufferization result when carefully choosing the
|
||||
SSA value used as "destination".
|
||||
|
||||
For every tensor result, a "destination-passing" style op has a corresponding
|
||||
tensor operand. If there aren't any other uses of this tensor, the bufferization
|
||||
can alias it with the op result and perform the operation "in-place" by reusing
|
||||
the buffer allocated for this "destination" input.
|
||||
For every tensor result, a DPS op has a corresponding tensor operand. If there
|
||||
aren't any other conflicting uses of this tensor, the bufferization can alias
|
||||
it with the op result and perform the operation "in-place" by reusing the buffer
|
||||
allocated for this "destination" input.
|
||||
|
||||
As an example, consider the following op: `%0 = tensor.insert %cst into
|
||||
%t[%idx] : tensor<?xf32>`
|
||||
As an example, consider the following op: `%r = tensor.insert %f into
|
||||
%t[%idx] : tensor<5xf32>`
|
||||
|
||||

|
||||
|
||||
`%t` is the "destination" in this example. When choosing a buffer for the result
|
||||
`%0`, denoted as `buffer(%0)`, One-Shot Bufferize considers only two options:
|
||||
`%r`, denoted as `buffer(%r)`, One-Shot Bufferize considers only two options:
|
||||
|
||||
1. `buffer(%0) = buffer(%t)` : alias the "destination" tensor with the
|
||||
result and perform the operation in-place.
|
||||
2. `buffer(%0)` is a newly allocated buffer.
|
||||
1. `buffer(%r) = buffer(%t)`: store the result in the existing `buffer(%t)`.
|
||||
Note that this is not always possible. E.g., if the old contents of
|
||||
`buffer(%t)` are still needed. One-Shot Bufferize's main task is to detect
|
||||
such cases and fall back to the second option when necessary.
|
||||
2. `buffer(%r)` is a newly allocated buffer.
|
||||
|
||||
There may be other buffers in the same function that could potentially be used
|
||||
for `buffer(%0)`, but those are not considered by One-Shot Bufferize to keep the
|
||||
for `buffer(%r)`, but those are not considered by One-Shot Bufferize to keep the
|
||||
bufferization simple. One-Shot Bufferize could be extended to consider such
|
||||
buffers in the future to achieve a better quality of bufferization.
|
||||
|
||||
@ -151,7 +154,7 @@ memory allocation. E.g.:
|
||||
```
|
||||
|
||||
The result of `tensor.generate` does not have a "destination" operand, so
|
||||
bufferization allocates a new buffer. This could be avoided by choosing an
|
||||
bufferization allocates a new buffer. This could be avoided by instead using an
|
||||
op such as `linalg.generic`, which can express the same computation with a
|
||||
"destination" operand, as specified behind outputs (`outs`):
|
||||
|
||||
@ -198,12 +201,61 @@ e.g.:
|
||||
```mlir
|
||||
%0 = "my_dialect.some_op"(%t) : (tensor<?xf32>) -> (tensor<?xf32>)
|
||||
%1 = "my_dialect.another_op"(%0) : (tensor<?xf32>) -> (tensor<?xf32>)
|
||||
|
||||
// "yet_another_op" likely needs to read the data of %0, so "another_op" cannot
|
||||
// in-place write to buffer(%0).
|
||||
%2 = "my_dialect.yet_another_op"(%0) : (tensor<?xf32>) -> (tensor<?xf32>)
|
||||
```
|
||||
|
||||
One-Shot Bufferize has debug flags (`test-analysis-only print-conflicts`) that
|
||||
print the results of the analysis and explain to the user why buffer copies were
|
||||
inserted.
|
||||
## Tensor / MemRef Boundary
|
||||
|
||||
The bufferization dialect provides a few helper ops to connect tensor IR (that
|
||||
should be bufferized) with existing buffers (that may be allocated/provided by
|
||||
a different runtime/library/etc.).
|
||||
|
||||
`bufferization.to_memref %t` returns the future buffer of a tensor SSA value.
|
||||
`bufferization.to_tensor %m` returns a tensor SSA value for a given MemRef
|
||||
buffer. `bufferization.materialize_in_destination` indicates that a tensor value
|
||||
should materialize in a certain buffer.
|
||||
|
||||
Consider the following example, where a TOSA matmul result should materialize in
|
||||
an existing buffer `%C`:
|
||||
|
||||
```mlir
|
||||
// Batched TOSA matrix multiplication. %A and %B are the
|
||||
// inputs, %C is the output.
|
||||
func.func @test_matmul(%A: memref<1x17x19xf32>,
|
||||
%B: memref<1x19x29xf32>,
|
||||
%C: memref<1x17x29xf32>) {
|
||||
|
||||
%A_tensor = bufferization.to_tensor %A restrict : memref<1x17x19xf32>
|
||||
%B_tensor = bufferization.to_tensor %B restrict : memref<1x19x29xf32>
|
||||
|
||||
%0 = tosa.matmul %A_tensor, %B_tensor
|
||||
: (tensor<1x17x19xf32>, tensor<1x19x29xf32>) ->
|
||||
tensor<1x17x29xf32>
|
||||
|
||||
bufferization.materialize_in_destination
|
||||
%0 in restrict writable %C
|
||||
: (tensor<1x17x29xf32>, memref<1x17x29xf32>) -> ()
|
||||
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
Note that all bufferization ops in this example have the `restrict` unit
|
||||
attribute set. This attribute is similar to the C restrict keyword and indicates
|
||||
that there is no other `to_tensor` or `materialize_in_destination` op with
|
||||
the same or an aliasing MemRef operand. Only such
|
||||
`to_tensor`/`materialize_in_destination` ops are supported. The `restrict`
|
||||
attribute gives strong aliasing guarantees to the bufferization analysis and
|
||||
allows us to look only at the tensor IR in a program. (Ops that do not operate
|
||||
on tensors are ignored by the One-Shot Bufferize.)
|
||||
|
||||
Also note that `tosa.matmul` cannot be bufferized as is: there is no
|
||||
`BufferizableOpInterface` implementation for that op. However, the op can be
|
||||
lowered to a combination of `tensor.empty` and `linalg.matmul`, which can be
|
||||
bufferized.
|
||||
|
||||
## Using One-Shot Bufferize
|
||||
|
||||
@ -221,17 +273,14 @@ By default, One-Shot Bufferize fails when it encounters an op with tensor
|
||||
semantics (i.e., tensor result or tensor operand) that is not bufferizable
|
||||
(i.e., does not implement `BufferizableOpInterface`). This can be avoided with
|
||||
`allow-unknown-ops`. In that case, One-Shot Bufferize inserts
|
||||
`to_memref`/`to_tensor` ops around the bufferization boundary. These ops are
|
||||
named versions of `unrealized_conversion_cast`. Note that One-Shot Bufferize's
|
||||
analysis can currently not analyze these ops, so input IR with such ops may fail
|
||||
bufferization. Therefore, running One-Shot Bufferize multiple times in a
|
||||
sequence is also not supported at the moment.
|
||||
`to_memref`/`to_tensor` ops around the bufferization boundary.
|
||||
|
||||
One-Shot Bufferize can be configured to bufferize only ops from a set of
|
||||
dialects with `dialect-filter`. This can be useful for gradually migrating from
|
||||
dialect conversion-based bufferization to One-Shot Bufferize. One-Shot Bufferize
|
||||
must run first in such a case, because dialect conversion-based bufferization
|
||||
generates `to_tensor`/`to_memref` ops which One-Shot Bufferize cannot analyze.
|
||||
generates `to_tensor` ops without the `restrict` unit attribute, which One-Shot
|
||||
Bufferize cannot analyze.
|
||||
|
||||
One-Shot Bufferize can also be called programmatically with
|
||||
[`bufferization::runOneShotBufferize`](https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h#L167).
|
||||
@ -240,6 +289,14 @@ Alternatively,
|
||||
skips the analysis and inserts a copy on every buffer write, just like the
|
||||
dialect conversion-based bufferization.
|
||||
|
||||
By default, function boundaries are not bufferized. This is because there are
|
||||
currently limitations around function graph bufferization: recursive
|
||||
calls are not supported. As long as there are no recursive calls, function
|
||||
boundary bufferization can be enabled with `bufferize-function-boundaries`. Each
|
||||
tensor function argument and tensor function result is then turned into a
|
||||
memref. The layout map of the memref type can be controlled with
|
||||
`function-boundary-type-conversion`.
|
||||
|
||||
## Memory Layouts
|
||||
|
||||
One-Shot Bufferize bufferizes ops from top to bottom. This works well when all
|
||||
@ -319,6 +376,11 @@ To get a better intuition of the interface methods, we invite users to take a
|
||||
look at existing implementations in MLIR, e.g., the implementation of
|
||||
`tensor.insert` or `tensor.extract`.
|
||||
|
||||
Interface implementations of DPS ops (that implement
|
||||
`DestinationStyleOpInterface`) can derive from
|
||||
`DstBufferizableOpInterfaceExternalModel`, which provides all necessary
|
||||
method implementations except for `bufferize`.
|
||||
|
||||
## Debugging Buffer Copies
|
||||
|
||||
To get a better understanding of why One-Shot Bufferize introduced a buffer
|
||||
@ -338,14 +400,90 @@ There are two reasons why a buffer copy may be inserted.
|
||||
In the first case, `print-conflicts` illustrates the conflict in the form of a
|
||||
("read", "conflicting write", "last write") tuple.
|
||||
|
||||
## Understanding the SSA Use-Def Chain Analysis
|
||||
A RaW conflict consists of three parts, in the following order according to
|
||||
op dominance:
|
||||
|
||||
1. **Definition:** A tensor `%t` is defined.
|
||||
2. **Conflicting Write:** An operation writes to `buffer(%t)`.
|
||||
3. **Read:** An operation reads `%t`.
|
||||
|
||||
When such a RaW conflict is detected during the analysis phase, One-Shot
|
||||
Bufferize will insert a buffer copy for the conflicting write.
|
||||
|
||||
**Example**
|
||||
|
||||
```mlir
|
||||
// RUN: mlir-opt %s -one-shot-bufferize="bufferize-function-boundaries test-analysis-only print-conflicts"
|
||||
func.func @test(%arg0: f32, %arg1: f32, %arg2: index, %arg3: index) -> (f32, tensor<3xf32>) {
|
||||
// Create a new tensor with [%arg0, %arg0, %arg0].
|
||||
%0 = tensor.from_elements %arg0, %arg0, %arg0 : tensor<3xf32>
|
||||
|
||||
// Insert something into the new tensor.
|
||||
%1 = tensor.insert %arg1 into %0[%arg2] : tensor<3xf32>
|
||||
|
||||
// Read from the old tensor.
|
||||
%r = tensor.extract %0[%arg3] : tensor<3xf32>
|
||||
|
||||
// Return the extracted value and the result of the insertion.
|
||||
func.return %r, %1 : f32, tensor<3xf32>
|
||||
}
|
||||
```
|
||||
|
||||
The output IR is as follows:
|
||||
|
||||
```mlir
|
||||
func.func @test(%arg0: f32, %arg1: f32, %arg2: index, %arg3: index) -> (f32, tensor<3xf32>) {
|
||||
%from_elements = tensor.from_elements %arg0, %arg0, %arg0 {"C_0[DEF: result 0]"} : tensor<3xf32>
|
||||
%inserted = tensor.insert %arg1 into %from_elements[%arg2] {"C_0[CONFL-WRITE: 1]", __inplace_operands_attr__ = ["none", "false", "none"]} : tensor<3xf32>
|
||||
%extracted = tensor.extract %from_elements[%arg3] {"C_0[READ: 0]", __inplace_operands_attr__ = ["true", "none"]} : tensor<3xf32>
|
||||
return {__inplace_operands_attr__ = ["none", "true"]} %extracted, %inserted : f32, tensor<3xf32>
|
||||
}
|
||||
```
|
||||
|
||||
Note that the IR was not bufferized. It was merely annotated with the results
|
||||
of the bufferization analysis. Every operation with tensor semantics has a
|
||||
`__inplace_operands_attr__` attribute with one value per operand. If an operand
|
||||
is not a tensor, the respective value is `none`. Otherwise, if the operand was
|
||||
decided to be bufferized in-place, the value is `true`. A value of `false`
|
||||
indicates a buffer copy. In the above example, a buffer copy would be inserted
|
||||
for `tensor.insert`, so that it does not overwrite `buffer(%from_elements)`,
|
||||
which is still needed for `tensor.extract`.
|
||||
|
||||
For each RaW (there is only one in the example), three `C_i` attributes were
|
||||
added:
|
||||
|
||||
* `C_0[DEF: result 0]`: A tensor is defined: 0-th result of
|
||||
`tensor.from_elements`.
|
||||
* `C_0[CONFL-WRITE: 1]`: An operation (if bufferized in-place) would write into
|
||||
the future buffer of the defined tensor: 1-st operand of `tensor.insert`.
|
||||
* `C_0[READ: 0]`: An operation reads the tensor definition: 0-th operand of
|
||||
`tensor.extract`.
|
||||
|
||||
The fully bufferized IR (with the inserted buffer copy) is as follows:
|
||||
|
||||
```mlir
|
||||
func.func @test(%arg0: f32, %arg1: f32, %arg2: index, %arg3: index) -> (f32, memref<3xf32>) {
|
||||
%c2 = arith.constant 2 : index
|
||||
%c1 = arith.constant 1 : index
|
||||
%c0 = arith.constant 0 : index
|
||||
%alloc = memref.alloc() {alignment = 64 : i64} : memref<3xf32>
|
||||
memref.store %arg0, %alloc[%c0] : memref<3xf32>
|
||||
memref.store %arg0, %alloc[%c1] : memref<3xf32>
|
||||
memref.store %arg0, %alloc[%c2] : memref<3xf32>
|
||||
%alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<3xf32>
|
||||
memref.copy %alloc, %alloc_0 : memref<3xf32> to memref<3xf32>
|
||||
memref.store %arg1, %alloc_0[%arg2] : memref<3xf32>
|
||||
%0 = memref.load %alloc[%arg3] : memref<3xf32>
|
||||
return %0, %alloc_0 : f32, memref<3xf32>
|
||||
}
|
||||
```
|
||||
|
||||
To get a better understanding of the SSA Use-Def Chain Analysis and the RaW
|
||||
conflict detection algorithm, we invite interested users to read the
|
||||
[design document](https://discourse.llvm.org/uploads/short-url/5kckJ3DftYwQokG252teFgw3sYa.pdf)
|
||||
and watch the corresponding [ODM talk](https://youtu.be/TXEo59CYS9A)
|
||||
([slides](https://mlir.llvm.org/OpenMeetings/2022-01-13-One-Shot-Bufferization.pdf)).
|
||||
can be used to bufferize a program in a single pass, as long as each op
|
||||
conflict detection algorithm, interested users may want to refer to:
|
||||
|
||||
* [Original design document](https://discourse.llvm.org/uploads/short-url/5kckJ3DftYwQokG252teFgw3sYa.pdf)
|
||||
* [ODM talk](https://youtu.be/TXEo59CYS9A), ([slides](https://mlir.llvm.org/OpenMeetings/2022-01-13-One-Shot-Bufferization.pdf)).
|
||||
* [LLVM Dev Meeting 2023 tutorial slides](https://m-sp.org/downloads/llvm_dev_2023.pdf)
|
||||
|
||||
## Migrating from Dialect Conversion-based Bufferization
|
||||
|
||||
@ -356,20 +494,6 @@ One-Shot Bufferize must run first because it cannot analyze those boundary ops.
|
||||
To update existing code step-by-step, it may be useful to specify a dialect
|
||||
filter for One-Shot Bufferize, so that dialects can be switched over one-by-one.
|
||||
|
||||
## Bufferization Function Graphs
|
||||
|
||||
One-Shot Bufferize does currently not support function graph bufferization.
|
||||
I.e., `CallOp`, `ReturnOp` and function bbArgs are not bufferizable. Users can
|
||||
run the existing `--func-bufferize` bufferization pass after One-Shot Bufferize.
|
||||
|
||||
Alternatively, users can try
|
||||
[`ModuleBufferization`](https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Linalg/ComprehensiveBufferize/ModuleBufferization.h#L31),
|
||||
which is an extension of One-Shot Bufferize. This bufferization is still under
|
||||
development and does not support arbitrary IR. In essence, returning a tensor
|
||||
from a function is not supported, unless it is equivalent to a function bbArg.
|
||||
In that case, the corresponding return value can simply be dropped during
|
||||
bufferization.
|
||||
|
||||
## Dialect Conversion-based Bufferization
|
||||
|
||||
Disclaimer: Most dialect conversion-based bufferization has been migrated to
|
||||
|
1
mlir/docs/includes/img/bufferization_passes.svg
Normal file
1
mlir/docs/includes/img/bufferization_passes.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 203 KiB |
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 78 KiB |
Loading…
x
Reference in New Issue
Block a user