mirror of
https://github.com/llvm/llvm-project.git
synced 2025-04-29 10:46:06 +00:00
329 lines
10 KiB
Markdown
329 lines
10 KiB
Markdown
![]() |
# MLIR Python Bindings
|
||
|
|
||
|
Current status: Under development and not enabled by default
|
||
|
|
||
|
|
||
|
## Building
|
||
|
|
||
|
### Pre-requisites
|
||
|
|
||
|
* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to
|
||
|
be located by CMake.
|
||
|
* A relatively recent Python3 installation
|
||
|
|
||
|
### CMake variables
|
||
|
|
||
|
* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL`
|
||
|
|
||
|
Enables building the Python bindings. Defaults to `OFF`.
|
||
|
|
||
|
* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL`
|
||
|
|
||
|
Links the native extension against the Python runtime library, which is
|
||
|
optional on some platforms. While setting this to `OFF` can yield some greater
|
||
|
deployment flexibility, linking in this way allows the linker to report
|
||
|
compile time errors for unresolved symbols on all platforms, which makes for a
|
||
|
smoother development workflow. Defaults to `ON`.
|
||
|
|
||
|
* **`PYTHON_EXECUTABLE`**:`STRING`
|
||
|
|
||
|
Specifies the `python` executable used for the LLVM build, including for
|
||
|
determining header/link flags for the Python bindings. On systems with
|
||
|
multiple Python implementations, setting this explicitly to the preferred
|
||
|
`python3` executable is strongly recommended.
|
||
|
|
||
|
|
||
|
## Design
|
||
|
|
||
|
### Use cases
|
||
|
|
||
|
There are likely two primary use cases for the MLIR python bindings:
|
||
|
|
||
|
1. Support users who expect that an installed version of LLVM/MLIR will yield
|
||
|
the ability to `import mlir` and use the API in a pure way out of the box.
|
||
|
|
||
|
2. Downstream integrations will likely want to include parts of the API in their
|
||
|
private namespace or specially built libraries, probably mixing it with other
|
||
|
python native bits.
|
||
|
|
||
|
|
||
|
### Composable modules
|
||
|
|
||
|
In order to support use case #2, the Python bindings are organized into
|
||
|
composable modules that downstream integrators can include and re-export into
|
||
|
their own namespace if desired. This forces several design points:
|
||
|
|
||
|
* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE`
|
||
|
global constructor.
|
||
|
|
||
|
* Introduce headers for C++-only wrapper classes as other related C++ modules
|
||
|
will need to interop with it.
|
||
|
|
||
|
* Separate any initialization routines that depend on optional components into
|
||
|
its own module/dependency (currently, things like `registerAllDialects` fall
|
||
|
into this category).
|
||
|
|
||
|
There are a lot of co-related issues of shared library linkage, distribution
|
||
|
concerns, etc that affect such things. Organizing the code into composable
|
||
|
modules (versus a monolithic `cpp` file) allows the flexibility to address many
|
||
|
of these as needed over time. Also, compilation time for all of the template
|
||
|
meta-programming in pybind scales with the number of things you define in a
|
||
|
translation unit. Breaking into multiple translation units can significantly aid
|
||
|
compile times for APIs with a large surface area.
|
||
|
|
||
|
### Submodules
|
||
|
|
||
|
Generally, the C++ codebase namespaces most things into the `mlir` namespace.
|
||
|
However, in order to modularize and make the Python bindings easier to
|
||
|
understand, sub-packages are defined that map roughly to the directory structure
|
||
|
of functional units in MLIR.
|
||
|
|
||
|
Examples:
|
||
|
|
||
|
* `mlir.ir`
|
||
|
* `mlir.passes` (`pass` is a reserved word :( )
|
||
|
* `mlir.dialect`
|
||
|
* `mlir.execution_engine` (aside from namespacing, it is important that
|
||
|
"bulky"/optional parts like this are isolated)
|
||
|
|
||
|
In addition, initialization functions that imply optional dependencies should
|
||
|
be in underscored (notionally private) modules such as `_init` and linked
|
||
|
separately. This allows downstream integrators to completely customize what is
|
||
|
included "in the box" and covers things like dialect registration,
|
||
|
pass registration, etc.
|
||
|
|
||
|
### Loader
|
||
|
|
||
|
LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with
|
||
|
other non-trivial native extensions. As such, the native extension (i.e. the
|
||
|
`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol
|
||
|
(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py`
|
||
|
and siblings which loads and re-exports it. This split provides a place to stage
|
||
|
code that needs to prepare the environment *before* the shared library is loaded
|
||
|
into the Python runtime, and also provides a place that one-time initialization
|
||
|
code can be invoked apart from module constructors.
|
||
|
|
||
|
To start with the `mlir/__init__.py` loader shim can be very simple and scale to
|
||
|
future need:
|
||
|
|
||
|
```python
|
||
|
from _mlir import *
|
||
|
```
|
||
|
|
||
|
### Limited use of globals
|
||
|
|
||
|
For normal operations, parent-child constructor relationships are realized with
|
||
|
constructor methods on a parent class as opposed to requiring
|
||
|
invocation/creation from a global symbol.
|
||
|
|
||
|
For example, consider two code fragments:
|
||
|
|
||
|
```python
|
||
|
|
||
|
op = build_my_op()
|
||
|
|
||
|
region = mlir.Region(op)
|
||
|
|
||
|
```
|
||
|
|
||
|
vs
|
||
|
|
||
|
```python
|
||
|
|
||
|
op = build_my_op()
|
||
|
|
||
|
region = op.new_region()
|
||
|
|
||
|
```
|
||
|
|
||
|
For tightly coupled data structures like `Operation`, the latter is generally
|
||
|
preferred because:
|
||
|
|
||
|
* It is syntactically less possible to create something that is going to access
|
||
|
illegal memory (less error handling in the bindings, less testing, etc).
|
||
|
|
||
|
* It reduces the global-API surface area for creating related entities. This
|
||
|
makes it more likely that if constructing IR based on an Operation instance of
|
||
|
unknown providence, receiving code can just call methods on it to do what they
|
||
|
want versus needing to reach back into the global namespace and find the right
|
||
|
`Region` class.
|
||
|
|
||
|
* It leaks fewer things that are in place for C++ convenience (i.e. default
|
||
|
constructors to invalid instances).
|
||
|
|
||
|
### Use the C-API
|
||
|
|
||
|
The Python APIs should seek to layer on top of the C-API to the degree possible.
|
||
|
Especially for the core, dialect-independent parts, such a binding enables
|
||
|
packaging decisions that would be difficult or impossible if spanning a C++ ABI
|
||
|
boundary. In addition, factoring in this way side-steps some very difficult
|
||
|
issues that arise when combining RTTI-based modules (which pybind derived things
|
||
|
are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM).
|
||
|
|
||
|
|
||
|
## Style
|
||
|
|
||
|
In general, for the core parts of MLIR, the Python bindings should be largely
|
||
|
isomorphic with the underlying C++ structures. However, concessions are made
|
||
|
either for practicality or to give the resulting library an appropriately
|
||
|
"Pythonic" flavor.
|
||
|
|
||
|
### Properties vs get*() methods
|
||
|
|
||
|
Generally favor converting trivial methods like `getContext()`, `getName()`,
|
||
|
`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is
|
||
|
primarily a matter of calling `def_property_readonly` vs `def` in binding code,
|
||
|
and makes things feel much nicer to the Python side.
|
||
|
|
||
|
For example, prefer:
|
||
|
|
||
|
```c++
|
||
|
m.def_property_readonly("context", ...)
|
||
|
```
|
||
|
|
||
|
Over:
|
||
|
|
||
|
```c++
|
||
|
m.def("getContext", ...)
|
||
|
```
|
||
|
|
||
|
### __repr__ methods
|
||
|
|
||
|
Things that have nice printed representations are really great :) If there is a
|
||
|
reasonable printed form, it can be a significant productivity boost to wire that
|
||
|
to the `__repr__` method (and verify it with a [doctest](#sample-doctest)).
|
||
|
|
||
|
### CamelCase vs snake_case
|
||
|
|
||
|
Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As
|
||
|
a mechanical concession to Python style, this can go a long way to making the
|
||
|
API feel like it fits in with its peers in the Python landscape.
|
||
|
|
||
|
If in doubt, choose names that will flow properly with other
|
||
|
[PEP 8 style names](https://pep8.org/#descriptive-naming-styles).
|
||
|
|
||
|
### Prefer pseudo-containers
|
||
|
|
||
|
Many core IR constructs provide methods directly on the instance to query count
|
||
|
and begin/end iterators. Prefer hoisting these to dedicated pseudo containers.
|
||
|
|
||
|
For example, a direct mapping of blocks within regions could be done this way:
|
||
|
|
||
|
```python
|
||
|
region = ...
|
||
|
|
||
|
for block in region:
|
||
|
|
||
|
pass
|
||
|
```
|
||
|
|
||
|
However, this way is preferred:
|
||
|
|
||
|
```python
|
||
|
region = ...
|
||
|
|
||
|
for block in region.blocks:
|
||
|
|
||
|
pass
|
||
|
|
||
|
print(len(region.blocks))
|
||
|
print(region.blocks[0])
|
||
|
print(region.blocks[-1])
|
||
|
```
|
||
|
|
||
|
Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate
|
||
|
them to appropriate `__dunder__` methods and iterator wrappers in the bindings.
|
||
|
|
||
|
Note that this can be taken too far, so use good judgment. For example, block
|
||
|
arguments may appear container-like but have defined methods for lookup and
|
||
|
mutation that would be hard to model properly without making semantics
|
||
|
complicated. If running into these, just mirror the C/C++ API.
|
||
|
|
||
|
### Provide one stop helpers for common things
|
||
|
|
||
|
One stop helpers that aggregate over multiple low level entities can be
|
||
|
incredibly helpful and are encouraged within reason. For example, making
|
||
|
`Context` have a `parse_asm` or equivalent that avoids needing to explicitly
|
||
|
construct a SourceMgr can be quite nice. One stop helpers do not have to be
|
||
|
mutually exclusive with a more complete mapping of the backing constructs.
|
||
|
|
||
|
## Testing
|
||
|
|
||
|
Tests should be added in the `test/Bindings/Python` directory and should
|
||
|
typically be `.py` files that have a lit run line.
|
||
|
|
||
|
While lit can run any python module, prefer to lay tests out according to these
|
||
|
rules:
|
||
|
|
||
|
* For tests of the API surface area, prefer
|
||
|
[`doctest`](https://docs.python.org/3/library/doctest.html).
|
||
|
* For generative tests (those that produce IR), define a Python module that
|
||
|
constructs/prints the IR and pipe it through `FileCheck`.
|
||
|
* Parsing should be kept self-contained within the module under test by use of
|
||
|
raw constants and an appropriate `parse_asm` call.
|
||
|
* Any file I/O code should be staged through a tempfile vs relying on file
|
||
|
artifacts/paths outside of the test module.
|
||
|
|
||
|
### Sample Doctest
|
||
|
|
||
|
```python
|
||
|
# RUN: %PYTHON %s
|
||
|
|
||
|
"""
|
||
|
>>> m = load_test_module()
|
||
|
Test basics:
|
||
|
>>> m.operation.name
|
||
|
"module"
|
||
|
>>> m.operation.is_registered
|
||
|
True
|
||
|
>>> ... etc ...
|
||
|
|
||
|
Verify that repr prints:
|
||
|
>>> m.operation
|
||
|
<operation 'module'>
|
||
|
"""
|
||
|
|
||
|
import mlir
|
||
|
|
||
|
TEST_MLIR_ASM = r"""
|
||
|
func @test_operation_correct_regions() {
|
||
|
// ...
|
||
|
}
|
||
|
"""
|
||
|
|
||
|
# TODO: Move to a test utility class once any of this actually exists.
|
||
|
def load_test_module():
|
||
|
ctx = mlir.ir.Context()
|
||
|
ctx.allow_unregistered_dialects = True
|
||
|
module = ctx.parse_asm(TEST_MLIR_ASM)
|
||
|
return module
|
||
|
|
||
|
|
||
|
if __name__ == "__main__":
|
||
|
import doctest
|
||
|
doctest.testmod()
|
||
|
```
|
||
|
|
||
|
### Sample FileCheck test
|
||
|
|
||
|
```python
|
||
|
# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck
|
||
|
|
||
|
# TODO: Move to a test utility class once any of this actually exists.
|
||
|
def print_module(f):
|
||
|
m = f()
|
||
|
print("// -----")
|
||
|
print("// TEST_FUNCTION:", f.__name__)
|
||
|
print(m.to_asm())
|
||
|
return f
|
||
|
|
||
|
# CHECK-LABEL: TEST_FUNCTION: create_my_op
|
||
|
@print_module
|
||
|
def create_my_op():
|
||
|
m = mlir.ir.Module()
|
||
|
builder = m.new_op_builder()
|
||
|
# CHECK: mydialect.my_operation ...
|
||
|
builder.my_op()
|
||
|
return m
|
||
|
```
|