llvm-project/mlir/docs/BytecodeFormat.md

# MLIR Bytecode Format

This document describes the MLIR bytecode format and its encoding.
This format is versioned and stable: we don't plan to ever break
compatibility, that is a dialect should be able to deserialize any
older bytecode. Similarly, we support back-deployment so that an
older version of the format can be targetted.

That said, it is important to realize that the promises of the
bytecode format are made assuming immutable dialects: the format
allows backward and forward compatibility, but only when nothing
in a dialect changes (operations, types, attributes definitions).

A dialect can opt-in to handle its own versioning through the
`BytecodeDialectInterface`. Some hooks are exposed to the dialect
to allow managing a version encoded into the bytecode file. The
version is loaded lazily and allows to retrieve the version
information while decoding the input IR, and gives an opportunity
to each dialect for which a version is present to perform IR
upgrades post-parsing through the `upgradeFromVersion` method.
There is no restriction on what kind of information a dialect
is allowed to encode to model its versioning.

[TOC]

## Magic Number

MLIR uses the following four-byte magic number to
indicate bytecode files:

'\[‘M’<sub>8</sub>, ‘L’<sub>8</sub>, ‘ï’<sub>8</sub>, ‘R’<sub>8</sub>\]'

In hex:

'\[‘4D’<sub>8</sub>, ‘4C’<sub>8</sub>, ‘EF’<sub>8</sub>, ‘52’<sub>8</sub>\]'

## Format Overview

An MLIR Bytecode file is comprised of a byte stream, with a few simple
structural concepts layered on top.

### Primitives

#### Fixed-Width Integers

```
  byte ::= `0x00`...`0xFF`
```

Fixed width integers are unsigned integers of a known byte size. The values are
stored in little-endian byte order.

TODO: Add larger fixed width integers as necessary.

#### Variable-Width Integers

Variable width integers, or `VarInt`s, provide a compact representation for
integers. Each encoded VarInt consists of one to nine bytes, which together
represent a single 64-bit value. The MLIR bytecode utilizes the "PrefixVarInt"
encoding for VarInts. This encoding is a variant of the
[LEB128 ("Little-Endian Base 128")](https://en.wikipedia.org/wiki/LEB128)
encoding, where each byte of the encoding provides up to 7 bits for the value,
with the remaining bit used to store a tag indicating the number of bytes used
for the encoding. This means that small unsigned integers (less than 2^7) may be
stored in one byte, unsigned integers up to 2^14 may be stored in two bytes,
etc.

The first byte of the encoding includes a length prefix in the low bits. This
prefix is a bit sequence of '0's followed by a terminal '1', or the end of the
byte. The number of '0' bits indicate the number of _additional_ bytes, not
including the prefix byte, used to encode the value. All of the remaining bits
in the first byte, along with all of the bits in the additional bytes, provide
the value of the integer. Below are the various possible encodings of the prefix
byte:

```
xxxxxxx1:  7 value bits, the encoding uses 1 byte
xxxxxx10: 14 value bits, the encoding uses 2 bytes
xxxxx100: 21 value bits, the encoding uses 3 bytes
xxxx1000: 28 value bits, the encoding uses 4 bytes
xxx10000: 35 value bits, the encoding uses 5 bytes
xx100000: 42 value bits, the encoding uses 6 bytes
x1000000: 49 value bits, the encoding uses 7 bytes
10000000: 56 value bits, the encoding uses 8 bytes
00000000: 64 value bits, the encoding uses 9 bytes
```

##### Signed Variable-Width Integers

Signed variable width integer values are encoded in a similar fashion to
[varints](#variable-width-integers), but employ
[zigzag encoding](https://en.wikipedia.org/wiki/Variable-length_quantity#Zigzag_encoding).
This encoding uses the low bit of the value to indicate the sign, which allows
for more efficiently encoding negative numbers. If a negative value were encoded
using a normal [varint](#variable-width-integers), it would be treated as an
extremely large unsigned value. Using zigzag encoding allows for a smaller
number of active bits in the value, leading to a smaller encoding. Below is the
basic computation for generating a zigzag encoding:

```
(value << 1) ^ (value >> 63)
```

#### Strings

Strings are blobs of characters with an associated length.

### Sections

```
section {
  idAndIsAligned: byte // id | (hasAlign << 7)
  length: varint,

  alignment: varint?,
  padding: byte[], // Padding bytes are always `0xCB`.

  data: byte[]
}
```

Sections are a mechanism for grouping data within the bytecode. They enable
delayed processing, which is useful for out-of-order processing of data,
lazy-loading, and more. Each section contains a Section ID, whose high bit
indicates if the section has alignment requirements, a length (which allows for
skipping over the section), and an optional alignment. When an alignment is
present, a variable number of padding bytes (0xCB) may appear before the section
data. The alignment of a section must be a power of 2.

## MLIR Encoding

Given the generic structure of MLIR, the bytecode encoding is actually fairly
simplistic. It effectively maps to the core components of MLIR.

### Top Level Structure

The top-level structure of the bytecode contains the 4-byte "magic number", a
version number, a null-terminated producer string, and a list of sections. Each
section is currently only expected to appear once within a bytecode file.

```
bytecode {
  magic: "MLïR",
  version: varint,
  producer: string,
  sections: section[]
}
```

### String Section

```
strings {
  numStrings: varint,
  reverseStringLengths: varint[],
  stringData: byte[]
}
```

The string section contains a table of strings referenced within the bytecode,
more easily enabling string sharing. This section is encoded first with the
total number of strings, followed by the sizes of each of the individual strings
in reverse order. The remaining encoding contains a single blob containing all
of the strings concatenated together.

### Dialect Section

The dialect section of the bytecode contains all of the dialects referenced
within the encoded IR, and some information about the components of those
dialects that were also referenced.

```
dialect_section {
  numDialects: varint,
  dialectNames: dialect_name_group[],
  opNames: dialect_ops_group[]  // ops grouped by dialect
}

dialect_name_group {
  nameAndIsVersioned: varint  // (dialectID << 1) | (hasVersion),
  version: dialect_version_section  // only if versioned
}

dialect_version_section {
  size: varint,
  version: byte[]
}

dialect_ops_group {
  dialect: varint,
  numOpNames: varint,
  opNames: op_name_group[]
}

op_name_group {
  nameAndIsRegistered: varint  // (nameID << 1) | (isRegisteredOp)
}
```

Dialects are encoded as a `varint` containing the index to the name string
within the string section, plus a flag indicating whether the dialect is
versioned. Operation names are encoded in groups by dialect, with each group
containing the dialect, the number of operation names, and the array of indexes
to each name within the string section. The version is encoded as a nested
section for each dialect.

### Attribute/Type Sections

Attributes and types are encoded using two [sections](#sections), one section
(`attr_type_section`) containing the actual encoded representation, and another
section (`attr_type_offset_section`) containing the offsets of each encoded
attribute/type into the previous section. This structure allows for attributes
and types to always be lazily loaded on demand.

```
attr_type_section {
  attrs: attribute[],
  types: type[]
}
attr_type_offset_section {
  numAttrs: varint,
  numTypes: varint,
  offsets: attr_type_offset_group[]
}

attr_type_offset_group {
  dialect: varint,
  numElements: varint,
  offsets: varint[] // (offset << 1) | (hasCustomEncoding)
}

attribute {
  encoding: ...
}
type {
  encoding: ...
}
```

Each `offset` in the `attr_type_offset_section` above is the size of the
encoding for the attribute or type and a flag indicating if the encoding uses
the textual assembly format, or a custom bytecode encoding. We avoid using the
direct offset into the `attr_type_section`, as a smaller relative offsets
provides more effective compression. Attributes and types are grouped by
dialect, with each `attr_type_offset_group` in the offset section containing the
corresponding parent dialect, number of elements, and offsets for each element
within the group.

#### Attribute/Type Encodings

In the abstract, an attribute/type is encoded in one of two possible ways: via
its assembly format, or via a custom dialect defined encoding.

##### Assembly Format Fallback

In the case where a dialect does not define a method for encoding the attribute
or type, the textual assembly format of that attribute or type is used as a
fallback. For example, a type `!bytecode.type<42>` would be encoded as the null
terminated string "!bytecode.type<42>". This ensures that every attribute and
type can be encoded, even if the owning dialect has not yet opted in to a more
efficient serialization.

TODO: We shouldn't redundantly encode the dialect name here, we should use a
reference to the parent dialect instead.

##### Dialect Defined Encoding

As an alternative to the assembly format fallback, dialects may also provide a
custom encoding for their attributes and types. Custom encodings are very
beneficial in that they are significantly smaller and faster to read and write.

Dialects can opt-in to providing custom encodings by implementing the
`BytecodeDialectInterface`. This interface provides hooks, namely
`readAttribute`/`readType` and `writeAttribute`/`writeType`, that will be used
by the bytecode reader and writer. These hooks are provided a reader and writer
implementation that can be used to encode various constructs in the underlying
bytecode format. A unique feature of this interface is that dialects may choose
to only encode a subset of their attributes and types in a custom bytecode
format, which can simplify adding new or experimental components that aren't
fully baked.

When implementing the bytecode interface, dialects are responsible for all
aspects of the encoding. This includes the indicator for which kind of attribute
or type is being encoded; the bytecode reader will only know that it has
encountered an attribute or type of a given dialect, it doesn't encode any
further information. As such, a common encoding idiom is to use a leading
`varint` code to indicate how the attribute or type was encoded.

### Resource Section

Resources are encoded using two [sections](#sections), one section
(`resource_section`) containing the actual encoded representation, and another
section (`resource_offset_section`) containing the offsets of each encoded
resource into the previous section.

```
resource_section {
  resources: resource[]
}
resource {
  value: resource_bool | resource_string | resource_blob
}
resource_bool {
  value: byte
}
resource_string {
  value: varint
}
resource_blob {
  alignment: varint,
  size: varint,
  padding: byte[],
  blob: byte[]
}

resource_offset_section {
  numExternalResourceGroups: varint,
  resourceGroups: resource_group[]
}
resource_group {
  key: varint,
  numResources: varint,
  resources: resource_info[]
}
resource_info {
  key: varint,
  size: varint
  kind: byte,
}
```

Resources are grouped by the provider, either an external entity or a dialect,
with each `resource_group` in the offset section containing the corresponding
provider, number of elements, and info for each element within the group. For
each element, we record the key, the value kind, and the encoded size. We avoid
using the direct offset into the `resource_section`, as a smaller relative
offsets provides more effective compression.

### IR Section

The IR section contains the encoded form of operations within the bytecode.

```
ir_section {
  block: block; // Single block without arguments.
}
```

#### Operation Encoding

```
op {
  name: varint,
  encodingMask: byte,
  location: varint,

  attrDict: varint?,

  numResults: varint?,
  resultTypes: varint[],

  numOperands: varint?,
  operands: varint[],

  numSuccessors: varint?,
  successors: varint[],

  numUseListOrders: varint?,
  useListOrders: uselist[],

  regionEncoding: varint?, // (numRegions << 1) | (isIsolatedFromAbove)

  // regions are stored in a section if isIsolatedFromAbove
  regions: (region | region_section)[]
}

uselist {
  indexInRange: varint?,
  useListEncoding: varint, // (numIndices << 1) | (isIndexPairEncoding)
  indices: varint[]
}
```

The encoding of an operation is important because this is generally the most
commonly appearing structure in the bytecode. A single encoding is used for
every type of operation. Given this prevalence, many of the fields of an
operation are optional. The `encodingMask` field is a bitmask which indicates
which of the components of the operation are present.

##### Location

The location is encoded as the index of the location within the attribute table.

##### Attributes

If the operation has attribues, the index of the operation attribute dictionary
within the attribute table is encoded.

##### Results

If the operation has results, the number of results and the indexes of the
result types within the type table are encoded.

##### Operands

If the operation has operands, the number of operands and the value index of
each operand is encoded. This value index is the relative ordering of the
definition of that value from the start of the first ancestor isolated region.

##### Successors

If the operation has successors, the number of successors and the indexes of the
successor blocks within the parent region are encoded.

##### Use-list orders

The reference use-list order is assumed to be the reverse of the global
enumeration of all the op operands that one would obtain with a pre-order walk
of the IR. This order is naturally obtained by building blocks of operations
op-by-op. However, some transformations may shuffle the use-lists with respect
to this reference ordering. If any of the results of the operation have a
use-list order that is not sorted with respect to the reference use-list order,
an encoding is emitted such that it is possible to reconstruct such order after
parsing the bytecode. The encoding represents an index map from the reference
operand order to the current use-list order. A bit flag is used to detect if
this encoding is of type index-pair or not. When the bit flag is set to zero,
the element at `i` represent the position of the use `i` of the reference list
into the current use-list. When the bit flag is set to `1`, the encoding
represent index pairs `(i, j)`, which indicate that the use at position `i` of
the reference list is mapped to position `j` in the current use-list. When only
less than half of the elements in the current use-list are shuffled with respect
to the reference use-list, the index-pair encoding is used to reduce the
bytecode memory requirements.

##### Regions

If the operation has regions, the number of regions and if the regions are
isolated from above are encoded together in a single varint. Afterwards, each
region is encoded inline.

#### Region Encoding

```
region {
  numBlocks: varint,

  numValues: varint?,
  blocks: block[]
}
```

A region is encoded first with the number of blocks within. If the region is
non-empty, the number of values defined directly within the region are encoded,
followed by the blocks of the region.

#### Block Encoding

```
block {
  encoding: varint, // (numOps << 1) | (hasBlockArgs)
  arguments: block_arguments?, // Optional based on encoding
  ops : op[]
}

block_arguments {
  numArgs: varint?,
  args: block_argument[]
  numUseListOrders: varint?,
  useListOrders: uselist[],
}

block_argument {
  typeAndLocation: varint, // (type << 1) | (hasLocation)
  location: varint? // Optional, else unknown location
}
```

A block is encoded with an array of operations and block arguments. The first
field is an encoding that combines the number of operations in the block, with a
flag indicating if the block has arguments.

Use-list orders are attached to block arguments similarly to how they are
attached to operation results.
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								# MLIR Bytecode Format
-												[mlir][docs] Update Bytecode documentation (#99854)

There were some discrepancies between the dialect section documentation
and the implementation.
											
										
										
											2024-08-19 10:23:56 +02:00
+								This document describes the MLIR bytecode format and its encoding.
-												[MLIR] Document the stability and versioning aspect of the Bytecode (#81969)

See the two RFCs:


https://discourse.llvm.org/t/rfc-mlir-bytecode-a-stable-serialization-format/71062
https://discourse.llvm.org/t/rfc-ir-versioning/5893
											
										
										
											2024-02-16 13:25:03 -08:00
+								This format is versioned and stable: we don't plan to ever break
-												[mlir][docs] Update Bytecode documentation (#99854)

There were some discrepancies between the dialect section documentation
and the implementation.
											
										
										
											2024-08-19 10:23:56 +02:00
+								compatibility, that is a dialect should be able to deserialize any
 								older bytecode. Similarly, we support back-deployment so that an
 								older version of the format can be targetted.
-												[MLIR] Document the stability and versioning aspect of the Bytecode (#81969)

See the two RFCs:


https://discourse.llvm.org/t/rfc-mlir-bytecode-a-stable-serialization-format/71062
https://discourse.llvm.org/t/rfc-ir-versioning/5893
											
										
										
											2024-02-16 13:25:03 -08:00
 								That said, it is important to realize that the promises of the
 								bytecode format are made assuming immutable dialects: the format
 								allows backward and forward compatibility, but only when nothing
 								in a dialect changes (operations, types, attributes definitions).
 								A dialect can opt-in to handle its own versioning through the
 								`BytecodeDialectInterface`. Some hooks are exposed to the dialect
 								to allow managing a version encoded into the bytecode file. The
 								version is loaded lazily and allows to retrieve the version
 								information while decoding the input IR, and gives an opportunity
 								to each dialect for which a version is present to perform IR
 								upgrades post-parsing through the `upgradeFromVersion` method.
 								There is no restriction on what kind of information a dialect
-												[mlir][docs] Update Bytecode documentation (#99854)

There were some discrepancies between the dialect section documentation
and the implementation.
											
										
										
											2024-08-19 10:23:56 +02:00
+								is allowed to encode to model its versioning.
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
 								[TOC]
 								## Magic Number
-												Implements MLIR Bytecode versioning capability

A dialect can opt-in to handle versioning through the
`BytecodeDialectInterface`. Few hooks are exposed to the dialect to allow
managing a version encoded into the bytecode file. The version is loaded
lazily and allows to retrieve the version information while parsing the input
IR, and gives an opportunity to each dialect for which a version is present
to perform IR upgrades post-parsing through the `upgradeFromVersion` method.
Custom Attribute and Type encodings can also be upgraded according to the
dialect version using readAttribute and readType methods.

There is no restriction on what kind of information a dialect is allowed to
encode to model its versioning. Currently, versioning is supported only for
bytecode formats.

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D143647

											
										
										
											2023-02-14 08:45:08 -08:00
+								MLIR uses the following four-byte magic number to
 								indicate bytecode files:
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
 								'\[‘M’<sub>8</sub>, ‘L’<sub>8</sub>, ‘ï’<sub>8</sub>, ‘R’<sub>8</sub>\]'
 								In hex:
 								'\[‘4D’<sub>8</sub>, ‘4C’<sub>8</sub>, ‘EF’<sub>8</sub>, ‘52’<sub>8</sub>\]'
 								## Format Overview
 								An MLIR Bytecode file is comprised of a byte stream, with a few simple
 								structural concepts layered on top.
 								### Primitives
 								#### Fixed-Width Integers
 								```
-												[mlir][doc] Remove trailing whitespace (NFC)

											
										
										
											2022-10-07 09:05:49 -07:00
+								  byte ::= `0x00`...`0xFF`
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								```
 								Fixed width integers are unsigned integers of a known byte size. The values are
 								stored in little-endian byte order.
 								TODO: Add larger fixed width integers as necessary.
 								#### Variable-Width Integers
 								Variable width integers, or `VarInt`s, provide a compact representation for
 								integers. Each encoded VarInt consists of one to nine bytes, which together
 								represent a single 64-bit value. The MLIR bytecode utilizes the "PrefixVarInt"
 								encoding for VarInts. This encoding is a variant of the
 								[LEB128 ("Little-Endian Base 128")](https://en.wikipedia.org/wiki/LEB128)
 								encoding, where each byte of the encoding provides up to 7 bits for the value,
 								with the remaining bit used to store a tag indicating the number of bytes used
 								for the encoding. This means that small unsigned integers (less than 2^7) may be
 								stored in one byte, unsigned integers up to 2^14 may be stored in two bytes,
 								etc.
 								The first byte of the encoding includes a length prefix in the low bits. This
 								prefix is a bit sequence of '0's followed by a terminal '1', or the end of the
 								byte. The number of '0' bits indicate the number of _additional_ bytes, not
 								including the prefix byte, used to encode the value. All of the remaining bits
 								in the first byte, along with all of the bits in the additional bytes, provide
 								the value of the integer. Below are the various possible encodings of the prefix
 								byte:
 								```
 								xxxxxxx1:  7 value bits, the encoding uses 1 byte
 								xxxxxx10: 14 value bits, the encoding uses 2 bytes
 								xxxxx100: 21 value bits, the encoding uses 3 bytes
 								xxxx1000: 28 value bits, the encoding uses 4 bytes
 								xxx10000: 35 value bits, the encoding uses 5 bytes
 								xx100000: 42 value bits, the encoding uses 6 bytes
 								x1000000: 49 value bits, the encoding uses 7 bytes
 								10000000: 56 value bits, the encoding uses 8 bytes
 								00000000: 64 value bits, the encoding uses 9 bytes
 								```
-												[mlir:Bytecode] Add encoding support for a majority of the builtin attributes

This adds support for the non-location, non-elements, non-affine
builtin attributes.

Differential Revision: https://reviews.llvm.org/D132539

											
										
										
											2022-08-23 23:39:18 -07:00
+								##### Signed Variable-Width Integers
 								Signed variable width integer values are encoded in a similar fashion to
 								[varints](#variable-width-integers), but employ
 								[zigzag encoding](https://en.wikipedia.org/wiki/Variable-length_quantity#Zigzag_encoding).
 								This encoding uses the low bit of the value to indicate the sign, which allows
 								for more efficiently encoding negative numbers. If a negative value were encoded
 								using a normal [varint](#variable-width-integers), it would be treated as an
 								extremely large unsigned value. Using zigzag encoding allows for a smaller
 								number of active bits in the value, leading to a smaller encoding. Below is the
 								basic computation for generating a zigzag encoding:
 								```
 								(value << 1) ^ (value >> 63)
 								```
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								#### Strings
 								Strings are blobs of characters with an associated length.
 								### Sections
 								```
 								section {
-												[mlir:Bytecode] Add support for encoding resources

Resources are encoded in two separate sections similarly to
attributes/types, one for the actual data and one for the data
offsets. Unlike other sections, the resource sections are optional
given that in many cases they won't be present. For testing,
bytecode serialization is added for DenseResourceElementsAttr.

Differential Revision: https://reviews.llvm.org/D132729

											
										
										
											2022-09-06 20:47:57 -07:00
+								  idAndIsAligned: byte // id | (hasAlign << 7)
 								  length: varint,
 								  alignment: varint?,
 								  padding: byte[], // Padding bytes are always `0xCB`.
 								  data: byte[]
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								}
 								```
-												[mlir:Bytecode] Add support for encoding resources

Resources are encoded in two separate sections similarly to
attributes/types, one for the actual data and one for the data
offsets. Unlike other sections, the resource sections are optional
given that in many cases they won't be present. For testing,
bytecode serialization is added for DenseResourceElementsAttr.

Differential Revision: https://reviews.llvm.org/D132729

											
										
										
											2022-09-06 20:47:57 -07:00
+								Sections are a mechanism for grouping data within the bytecode. They enable
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								delayed processing, which is useful for out-of-order processing of data,
-												[mlir:Bytecode] Add support for encoding resources

Resources are encoded in two separate sections similarly to
attributes/types, one for the actual data and one for the data
offsets. Unlike other sections, the resource sections are optional
given that in many cases they won't be present. For testing,
bytecode serialization is added for DenseResourceElementsAttr.

Differential Revision: https://reviews.llvm.org/D132729

											
										
										
											2022-09-06 20:47:57 -07:00
+								lazy-loading, and more. Each section contains a Section ID, whose high bit
 								indicates if the section has alignment requirements, a length (which allows for
 								skipping over the section), and an optional alignment. When an alignment is
 								present, a variable number of padding bytes (0xCB) may appear before the section
 								data. The alignment of a section must be a power of 2.
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
 								## MLIR Encoding
 								Given the generic structure of MLIR, the bytecode encoding is actually fairly
 								simplistic. It effectively maps to the core components of MLIR.
 								### Top Level Structure
 								The top-level structure of the bytecode contains the 4-byte "magic number", a
 								version number, a null-terminated producer string, and a list of sections. Each
 								section is currently only expected to appear once within a bytecode file.
 								```
 								bytecode {
 								  magic: "MLïR",
 								  version: varint,
 								  producer: string,
 								  sections: section[]
 								}
 								```
 								### String Section
 								```
 								strings {
 								  numStrings: varint,
 								  reverseStringLengths: varint[],
 								  stringData: byte[]
 								}
 								```
 								The string section contains a table of strings referenced within the bytecode,
 								more easily enabling string sharing. This section is encoded first with the
 								total number of strings, followed by the sizes of each of the individual strings
 								in reverse order. The remaining encoding contains a single blob containing all
 								of the strings concatenated together.
 								### Dialect Section
 								The dialect section of the bytecode contains all of the dialects referenced
 								within the encoded IR, and some information about the components of those
 								dialects that were also referenced.
 								```
 								dialect_section {
 								  numDialects: varint,
-												[mlir][docs] Update Bytecode documentation (#99854)

There were some discrepancies between the dialect section documentation
and the implementation.
											
										
										
											2024-08-19 10:23:56 +02:00
+								  dialectNames: dialect_name_group[],
 								  opNames: dialect_ops_group[]  // ops grouped by dialect
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								}
-												[mlir][docs] Update Bytecode documentation (#99854)

There were some discrepancies between the dialect section documentation
and the implementation.
											
										
										
											2024-08-19 10:23:56 +02:00
+								dialect_name_group {
 								  nameAndIsVersioned: varint  // (dialectID << 1) | (hasVersion),
 								  version: dialect_version_section  // only if versioned
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								}
-												Implements MLIR Bytecode versioning capability

A dialect can opt-in to handle versioning through the
`BytecodeDialectInterface`. Few hooks are exposed to the dialect to allow
managing a version encoded into the bytecode file. The version is loaded
lazily and allows to retrieve the version information while parsing the input
IR, and gives an opportunity to each dialect for which a version is present
to perform IR upgrades post-parsing through the `upgradeFromVersion` method.
Custom Attribute and Type encodings can also be upgraded according to the
dialect version using readAttribute and readType methods.

There is no restriction on what kind of information a dialect is allowed to
encode to model its versioning. Currently, versioning is supported only for
bytecode formats.

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D143647

											
										
										
											2023-02-14 08:45:08 -08:00
 								dialect_version_section {
 								  size: varint,
 								  version: byte[]
 								}
-												[mlir][docs] Update Bytecode documentation (#99854)

There were some discrepancies between the dialect section documentation
and the implementation.
											
										
										
											2024-08-19 10:23:56 +02:00
+								dialect_ops_group {
 								  dialect: varint,
 								  numOpNames: varint,
 								  opNames: op_name_group[]
 								}
 								op_name_group {
 								  nameAndIsRegistered: varint  // (nameID << 1) | (isRegisteredOp)
 								}
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								```
-												Implements MLIR Bytecode versioning capability

A dialect can opt-in to handle versioning through the
`BytecodeDialectInterface`. Few hooks are exposed to the dialect to allow
managing a version encoded into the bytecode file. The version is loaded
lazily and allows to retrieve the version information while parsing the input
IR, and gives an opportunity to each dialect for which a version is present
to perform IR upgrades post-parsing through the `upgradeFromVersion` method.
Custom Attribute and Type encodings can also be upgraded according to the
dialect version using readAttribute and readType methods.

There is no restriction on what kind of information a dialect is allowed to
encode to model its versioning. Currently, versioning is supported only for
bytecode formats.

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D143647

											
										
										
											2023-02-14 08:45:08 -08:00
+								Dialects are encoded as a `varint` containing the index to the name string
 								within the string section, plus a flag indicating whether the dialect is
 								versioned. Operation names are encoded in groups by dialect, with each group
 								containing the dialect, the number of operation names, and the array of indexes
 								to each name within the string section. The version is encoded as a nested
-												[mlir][docs] Update Bytecode documentation (#99854)

There were some discrepancies between the dialect section documentation
and the implementation.
											
										
										
											2024-08-19 10:23:56 +02:00
+								section for each dialect.
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
 								### Attribute/Type Sections
 								Attributes and types are encoded using two [sections](#sections), one section
 								(`attr_type_section`) containing the actual encoded representation, and another
 								section (`attr_type_offset_section`) containing the offsets of each encoded
 								attribute/type into the previous section. This structure allows for attributes
 								and types to always be lazily loaded on demand.
 								```
 								attr_type_section {
 								  attrs: attribute[],
-												[mlir][doc] Remove trailing whitespace (NFC)

											
										
										
											2022-10-07 09:05:49 -07:00
+								  types: type[]
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								}
 								attr_type_offset_section {
 								  numAttrs: varint,
 								  numTypes: varint,
 								  offsets: attr_type_offset_group[]
 								}
 								attr_type_offset_group {
 								  dialect: varint,
 								  numElements: varint,
-												[mlir][doc] Remove trailing whitespace (NFC)

											
										
										
											2022-10-07 09:05:49 -07:00
+								  offsets: varint[] // (offset << 1) | (hasCustomEncoding)
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								}
 								attribute {
 								  encoding: ...
 								}
 								type {
 								  encoding: ...
 								}
 								```
 								Each `offset` in the `attr_type_offset_section` above is the size of the
 								encoding for the attribute or type and a flag indicating if the encoding uses
 								the textual assembly format, or a custom bytecode encoding. We avoid using the
 								direct offset into the `attr_type_section`, as a smaller relative offsets
 								provides more effective compression. Attributes and types are grouped by
 								dialect, with each `attr_type_offset_group` in the offset section containing the
 								corresponding parent dialect, number of elements, and offsets for each element
 								within the group.
 								#### Attribute/Type Encodings
 								In the abstract, an attribute/type is encoded in one of two possible ways: via
 								its assembly format, or via a custom dialect defined encoding.
 								##### Assembly Format Fallback
 								In the case where a dialect does not define a method for encoding the attribute
 								or type, the textual assembly format of that attribute or type is used as a
-												[mlir][docs] Update Bytecode documentation (#99854)

There were some discrepancies between the dialect section documentation
and the implementation.
											
										
										
											2024-08-19 10:23:56 +02:00
+								fallback. For example, a type `!bytecode.type<42>` would be encoded as the null
 								terminated string "!bytecode.type<42>". This ensures that every attribute and
 								type can be encoded, even if the owning dialect has not yet opted in to a more
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								efficient serialization.
 								TODO: We shouldn't redundantly encode the dialect name here, we should use a
 								reference to the parent dialect instead.
 								##### Dialect Defined Encoding
-												[mlir][docs] Update Bytecode documentation (#99854)

There were some discrepancies between the dialect section documentation
and the implementation.
											
										
										
											2024-08-19 10:23:56 +02:00
+								As an alternative to the assembly format fallback, dialects may also provide a
 								custom encoding for their attributes and types. Custom encodings are very
 								beneficial in that they are significantly smaller and faster to read and write.
-												[mlir:Bytecode] Add initial support for dialect defined attribute/type encodings

Dialects can opt-in to providing custom encodings by implementing the
`BytecodeDialectInterface`. This interface provides hooks, namely
`readAttribute`/`readType` and `writeAttribute`/`writeType`, that will be used
by the bytecode reader and writer. These hooks are provided a reader and writer
implementation that can be used to encode various constructs in the underlying
bytecode format. A unique feature of this interface is that dialects may choose
to only encode a subset of their attributes and types in a custom bytecode
format, which can simplify adding new or experimental components that aren't
fully baked.

Differential Revision: https://reviews.llvm.org/D132498

											
										
										
											2022-08-23 12:56:02 -07:00
 								Dialects can opt-in to providing custom encodings by implementing the
 								`BytecodeDialectInterface`. This interface provides hooks, namely
 								`readAttribute`/`readType` and `writeAttribute`/`writeType`, that will be used
 								by the bytecode reader and writer. These hooks are provided a reader and writer
 								implementation that can be used to encode various constructs in the underlying
 								bytecode format. A unique feature of this interface is that dialects may choose
 								to only encode a subset of their attributes and types in a custom bytecode
 								format, which can simplify adding new or experimental components that aren't
 								fully baked.
 								When implementing the bytecode interface, dialects are responsible for all
 								aspects of the encoding. This includes the indicator for which kind of attribute
 								or type is being encoded; the bytecode reader will only know that it has
 								encountered an attribute or type of a given dialect, it doesn't encode any
 								further information. As such, a common encoding idiom is to use a leading
 								`varint` code to indicate how the attribute or type was encoded.
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
-												[mlir:Bytecode] Add support for encoding resources

Resources are encoded in two separate sections similarly to
attributes/types, one for the actual data and one for the data
offsets. Unlike other sections, the resource sections are optional
given that in many cases they won't be present. For testing,
bytecode serialization is added for DenseResourceElementsAttr.

Differential Revision: https://reviews.llvm.org/D132729

											
										
										
											2022-09-06 20:47:57 -07:00
+								### Resource Section
 								Resources are encoded using two [sections](#sections), one section
 								(`resource_section`) containing the actual encoded representation, and another
 								section (`resource_offset_section`) containing the offsets of each encoded
 								resource into the previous section.
 								```
 								resource_section {
 								  resources: resource[]
 								}
 								resource {
 								  value: resource_bool | resource_string | resource_blob
 								}
 								resource_bool {
 								  value: byte
 								}
 								resource_string {
 								  value: varint
 								}
 								resource_blob {
 								  alignment: varint,
 								  size: varint,
 								  padding: byte[],
 								  blob: byte[]
 								}
 								resource_offset_section {
 								  numExternalResourceGroups: varint,
 								  resourceGroups: resource_group[]
 								}
 								resource_group {
 								  key: varint,
 								  numResources: varint,
 								  resources: resource_info[]
 								}
 								resource_info {
 								  key: varint,
 								  size: varint
 								  kind: byte,
 								}
 								```
 								Resources are grouped by the provider, either an external entity or a dialect,
 								with each `resource_group` in the offset section containing the corresponding
 								provider, number of elements, and info for each element within the group. For
 								each element, we record the key, the value kind, and the encoded size. We avoid
 								using the direct offset into the `resource_section`, as a smaller relative
 								offsets provides more effective compression.
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								### IR Section
 								The IR section contains the encoded form of operations within the bytecode.
-												Add support for Lazyloading to the MLIR bytecode

IsolatedRegions are emitted in sections in order for the reader to be
able to skip over them. A new class is exposed to manage the state and
allow the readers to load these IsolatedRegions on-demand.

Differential Revision: https://reviews.llvm.org/D149515

											
										
										
											2023-04-29 02:36:45 -07:00
+								```
 								ir_section {
 								  block: block; // Single block without arguments.
 								}
 								```
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								#### Operation Encoding
 								```
 								op {
 								  name: varint,
 								  encodingMask: byte,
 								  location: varint,
 								  attrDict: varint?,
 								  numResults: varint?,
 								  resultTypes: varint[],
 								  numOperands: varint?,
 								  operands: varint[],
 								  numSuccessors: varint?,
 								  successors: varint[],
-												Preserve use-list orders in mlir bytecode

This patch implements a mechanism to read/write use-list orders from/to the mlir bytecode format. When producing bytecode, use-list orders are appended to each value of the IR. When reading bytecode, use-lists orders are loaded in memory and used at the end of parsing to sort the existing use-list chains.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D149755

											
										
										
											2023-05-21 16:46:59 -07:00
+								  numUseListOrders: varint?,
 								  useListOrders: uselist[],
-												[mlir][doc] Remove trailing whitespace (NFC)

											
										
										
											2022-10-07 09:05:49 -07:00
+								  regionEncoding: varint?, // (numRegions << 1) | (isIsolatedFromAbove)
-												Add support for Lazyloading to the MLIR bytecode

IsolatedRegions are emitted in sections in order for the reader to be
able to skip over them. A new class is exposed to manage the state and
allow the readers to load these IsolatedRegions on-demand.

Differential Revision: https://reviews.llvm.org/D149515

											
										
										
											2023-04-29 02:36:45 -07:00
 								  // regions are stored in a section if isIsolatedFromAbove
 								  regions: (region | region_section)[]
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								}
-												Preserve use-list orders in mlir bytecode

This patch implements a mechanism to read/write use-list orders from/to the mlir bytecode format. When producing bytecode, use-list orders are appended to each value of the IR. When reading bytecode, use-lists orders are loaded in memory and used at the end of parsing to sort the existing use-list chains.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D149755

											
										
										
											2023-05-21 16:46:59 -07:00
 								uselist {
 								  indexInRange: varint?,
 								  useListEncoding: varint, // (numIndices << 1) | (isIndexPairEncoding)
 								  indices: varint[]
 								}
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								```
 								The encoding of an operation is important because this is generally the most
 								commonly appearing structure in the bytecode. A single encoding is used for
-												[mlir][docs] Update Bytecode documentation (#99854)

There were some discrepancies between the dialect section documentation
and the implementation.
											
										
										
											2024-08-19 10:23:56 +02:00
+								every type of operation. Given this prevalence, many of the fields of an
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								operation are optional. The `encodingMask` field is a bitmask which indicates
 								which of the components of the operation are present.
 								##### Location
 								The location is encoded as the index of the location within the attribute table.
 								##### Attributes
 								If the operation has attribues, the index of the operation attribute dictionary
 								within the attribute table is encoded.
 								##### Results
 								If the operation has results, the number of results and the indexes of the
 								result types within the type table are encoded.
 								##### Operands
 								If the operation has operands, the number of operands and the value index of
 								each operand is encoded. This value index is the relative ordering of the
 								definition of that value from the start of the first ancestor isolated region.
 								##### Successors
 								If the operation has successors, the number of successors and the indexes of the
 								successor blocks within the parent region are encoded.
-												Preserve use-list orders in mlir bytecode

This patch implements a mechanism to read/write use-list orders from/to the mlir bytecode format. When producing bytecode, use-list orders are appended to each value of the IR. When reading bytecode, use-lists orders are loaded in memory and used at the end of parsing to sort the existing use-list chains.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D149755

											
										
										
											2023-05-21 16:46:59 -07:00
+								##### Use-list orders
 								The reference use-list order is assumed to be the reverse of the global
 								enumeration of all the op operands that one would obtain with a pre-order walk
 								of the IR. This order is naturally obtained by building blocks of operations
 								op-by-op. However, some transformations may shuffle the use-lists with respect
 								to this reference ordering. If any of the results of the operation have a
 								use-list order that is not sorted with respect to the reference use-list order,
 								an encoding is emitted such that it is possible to reconstruct such order after
 								parsing the bytecode. The encoding represents an index map from the reference
 								operand order to the current use-list order. A bit flag is used to detect if
 								this encoding is of type index-pair or not. When the bit flag is set to zero,
 								the element at `i` represent the position of the use `i` of the reference list
 								into the current use-list. When the bit flag is set to `1`, the encoding
 								represent index pairs `(i, j)`, which indicate that the use at position `i` of
 								the reference list is mapped to position `j` in the current use-list. When only
 								less than half of the elements in the current use-list are shuffled with respect
 								to the reference use-list, the index-pair encoding is used to reduce the
 								bytecode memory requirements.
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								##### Regions
 								If the operation has regions, the number of regions and if the regions are
 								isolated from above are encoded together in a single varint. Afterwards, each
 								region is encoded inline.
 								#### Region Encoding
 								```
 								region {
 								  numBlocks: varint,
 								  numValues: varint?,
 								  blocks: block[]
 								}
 								```
 								A region is encoded first with the number of blocks within. If the region is
 								non-empty, the number of values defined directly within the region are encoded,
 								followed by the blocks of the region.
 								#### Block Encoding
 								```
 								block {
 								  encoding: varint, // (numOps << 1) | (hasBlockArgs)
 								  arguments: block_arguments?, // Optional based on encoding
 								  ops : op[]
 								}
 								block_arguments {
 								  numArgs: varint?,
 								  args: block_argument[]
-												Preserve use-list orders in mlir bytecode

This patch implements a mechanism to read/write use-list orders from/to the mlir bytecode format. When producing bytecode, use-list orders are appended to each value of the IR. When reading bytecode, use-lists orders are loaded in memory and used at the end of parsing to sort the existing use-list chains.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D149755

											
										
										
											2023-05-21 16:46:59 -07:00
+								  numUseListOrders: varint?,
 								  useListOrders: uselist[],
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								}
 								block_argument {
-												[mlir][bytecode] Avoid recording null arglocs & realloc opnames.

For block arg locs a common case is no/uknown location (where the producer
signifies they don't care about blockarg location). Also avoid needing to
dynamically resize opnames during parsing.

Assumed to be post lazy loading change, so chose version 3.

Differential Revision: https://reviews.llvm.org/D151038

											
										
										
											2023-05-25 09:24:50 -07:00
+								  typeAndLocation: varint, // (type << 1) | (hasLocation)
 								  location: varint? // Optional, else unknown location
-												[mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

											
										
										
											2022-08-11 20:06:33 -07:00
+								}
 								```
 								A block is encoded with an array of operations and block arguments. The first
 								field is an encoding that combines the number of operations in the block, with a
 								flag indicating if the block has arguments.
-												Preserve use-list orders in mlir bytecode

This patch implements a mechanism to read/write use-list orders from/to the mlir bytecode format. When producing bytecode, use-list orders are appended to each value of the IR. When reading bytecode, use-lists orders are loaded in memory and used at the end of parsing to sort the existing use-list chains.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D149755

											
										
										
											2023-05-21 16:46:59 -07:00
 								Use-list orders are attached to block arguments similarly to how they are
 								attached to operation results.