2022-08-11 20:06:33 -07:00
|
|
|
|
# MLIR Bytecode Format
|
|
|
|
|
|
2024-08-19 10:23:56 +02:00
|
|
|
|
This document describes the MLIR bytecode format and its encoding.
|
2024-02-16 13:25:03 -08:00
|
|
|
|
This format is versioned and stable: we don't plan to ever break
|
2024-08-19 10:23:56 +02:00
|
|
|
|
compatibility, that is a dialect should be able to deserialize any
|
|
|
|
|
older bytecode. Similarly, we support back-deployment so that an
|
|
|
|
|
older version of the format can be targetted.
|
2024-02-16 13:25:03 -08:00
|
|
|
|
|
|
|
|
|
That said, it is important to realize that the promises of the
|
|
|
|
|
bytecode format are made assuming immutable dialects: the format
|
|
|
|
|
allows backward and forward compatibility, but only when nothing
|
|
|
|
|
in a dialect changes (operations, types, attributes definitions).
|
|
|
|
|
|
|
|
|
|
A dialect can opt-in to handle its own versioning through the
|
|
|
|
|
`BytecodeDialectInterface`. Some hooks are exposed to the dialect
|
|
|
|
|
to allow managing a version encoded into the bytecode file. The
|
|
|
|
|
version is loaded lazily and allows to retrieve the version
|
|
|
|
|
information while decoding the input IR, and gives an opportunity
|
|
|
|
|
to each dialect for which a version is present to perform IR
|
|
|
|
|
upgrades post-parsing through the `upgradeFromVersion` method.
|
|
|
|
|
There is no restriction on what kind of information a dialect
|
2024-08-19 10:23:56 +02:00
|
|
|
|
is allowed to encode to model its versioning.
|
2022-08-11 20:06:33 -07:00
|
|
|
|
|
|
|
|
|
[TOC]
|
|
|
|
|
|
|
|
|
|
## Magic Number
|
|
|
|
|
|
2023-02-14 08:45:08 -08:00
|
|
|
|
MLIR uses the following four-byte magic number to
|
|
|
|
|
indicate bytecode files:
|
2022-08-11 20:06:33 -07:00
|
|
|
|
|
|
|
|
|
'\[‘M’<sub>8</sub>, ‘L’<sub>8</sub>, ‘ï’<sub>8</sub>, ‘R’<sub>8</sub>\]'
|
|
|
|
|
|
|
|
|
|
In hex:
|
|
|
|
|
|
|
|
|
|
'\[‘4D’<sub>8</sub>, ‘4C’<sub>8</sub>, ‘EF’<sub>8</sub>, ‘52’<sub>8</sub>\]'
|
|
|
|
|
|
|
|
|
|
## Format Overview
|
|
|
|
|
|
|
|
|
|
An MLIR Bytecode file is comprised of a byte stream, with a few simple
|
|
|
|
|
structural concepts layered on top.
|
|
|
|
|
|
|
|
|
|
### Primitives
|
|
|
|
|
|
|
|
|
|
#### Fixed-Width Integers
|
|
|
|
|
|
|
|
|
|
```
|
2022-10-07 09:05:49 -07:00
|
|
|
|
byte ::= `0x00`...`0xFF`
|
2022-08-11 20:06:33 -07:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Fixed width integers are unsigned integers of a known byte size. The values are
|
|
|
|
|
stored in little-endian byte order.
|
|
|
|
|
|
|
|
|
|
TODO: Add larger fixed width integers as necessary.
|
|
|
|
|
|
|
|
|
|
#### Variable-Width Integers
|
|
|
|
|
|
|
|
|
|
Variable width integers, or `VarInt`s, provide a compact representation for
|
|
|
|
|
integers. Each encoded VarInt consists of one to nine bytes, which together
|
|
|
|
|
represent a single 64-bit value. The MLIR bytecode utilizes the "PrefixVarInt"
|
|
|
|
|
encoding for VarInts. This encoding is a variant of the
|
|
|
|
|
[LEB128 ("Little-Endian Base 128")](https://en.wikipedia.org/wiki/LEB128)
|
|
|
|
|
encoding, where each byte of the encoding provides up to 7 bits for the value,
|
|
|
|
|
with the remaining bit used to store a tag indicating the number of bytes used
|
|
|
|
|
for the encoding. This means that small unsigned integers (less than 2^7) may be
|
|
|
|
|
stored in one byte, unsigned integers up to 2^14 may be stored in two bytes,
|
|
|
|
|
etc.
|
|
|
|
|
|
|
|
|
|
The first byte of the encoding includes a length prefix in the low bits. This
|
|
|
|
|
prefix is a bit sequence of '0's followed by a terminal '1', or the end of the
|
|
|
|
|
byte. The number of '0' bits indicate the number of _additional_ bytes, not
|
|
|
|
|
including the prefix byte, used to encode the value. All of the remaining bits
|
|
|
|
|
in the first byte, along with all of the bits in the additional bytes, provide
|
|
|
|
|
the value of the integer. Below are the various possible encodings of the prefix
|
|
|
|
|
byte:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
xxxxxxx1: 7 value bits, the encoding uses 1 byte
|
|
|
|
|
xxxxxx10: 14 value bits, the encoding uses 2 bytes
|
|
|
|
|
xxxxx100: 21 value bits, the encoding uses 3 bytes
|
|
|
|
|
xxxx1000: 28 value bits, the encoding uses 4 bytes
|
|
|
|
|
xxx10000: 35 value bits, the encoding uses 5 bytes
|
|
|
|
|
xx100000: 42 value bits, the encoding uses 6 bytes
|
|
|
|
|
x1000000: 49 value bits, the encoding uses 7 bytes
|
|
|
|
|
10000000: 56 value bits, the encoding uses 8 bytes
|
|
|
|
|
00000000: 64 value bits, the encoding uses 9 bytes
|
|
|
|
|
```
|
|
|
|
|
|
2022-08-23 23:39:18 -07:00
|
|
|
|
##### Signed Variable-Width Integers
|
|
|
|
|
|
|
|
|
|
Signed variable width integer values are encoded in a similar fashion to
|
|
|
|
|
[varints](#variable-width-integers), but employ
|
|
|
|
|
[zigzag encoding](https://en.wikipedia.org/wiki/Variable-length_quantity#Zigzag_encoding).
|
|
|
|
|
This encoding uses the low bit of the value to indicate the sign, which allows
|
|
|
|
|
for more efficiently encoding negative numbers. If a negative value were encoded
|
|
|
|
|
using a normal [varint](#variable-width-integers), it would be treated as an
|
|
|
|
|
extremely large unsigned value. Using zigzag encoding allows for a smaller
|
|
|
|
|
number of active bits in the value, leading to a smaller encoding. Below is the
|
|
|
|
|
basic computation for generating a zigzag encoding:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
(value << 1) ^ (value >> 63)
|
|
|
|
|
```
|
|
|
|
|
|
2022-08-11 20:06:33 -07:00
|
|
|
|
#### Strings
|
|
|
|
|
|
|
|
|
|
Strings are blobs of characters with an associated length.
|
|
|
|
|
|
|
|
|
|
### Sections
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
section {
|
2022-09-06 20:47:57 -07:00
|
|
|
|
idAndIsAligned: byte // id | (hasAlign << 7)
|
|
|
|
|
length: varint,
|
|
|
|
|
|
|
|
|
|
alignment: varint?,
|
|
|
|
|
padding: byte[], // Padding bytes are always `0xCB`.
|
|
|
|
|
|
|
|
|
|
data: byte[]
|
2022-08-11 20:06:33 -07:00
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2022-09-06 20:47:57 -07:00
|
|
|
|
Sections are a mechanism for grouping data within the bytecode. They enable
|
2022-08-11 20:06:33 -07:00
|
|
|
|
delayed processing, which is useful for out-of-order processing of data,
|
2022-09-06 20:47:57 -07:00
|
|
|
|
lazy-loading, and more. Each section contains a Section ID, whose high bit
|
|
|
|
|
indicates if the section has alignment requirements, a length (which allows for
|
|
|
|
|
skipping over the section), and an optional alignment. When an alignment is
|
|
|
|
|
present, a variable number of padding bytes (0xCB) may appear before the section
|
|
|
|
|
data. The alignment of a section must be a power of 2.
|
2022-08-11 20:06:33 -07:00
|
|
|
|
|
|
|
|
|
## MLIR Encoding
|
|
|
|
|
|
|
|
|
|
Given the generic structure of MLIR, the bytecode encoding is actually fairly
|
|
|
|
|
simplistic. It effectively maps to the core components of MLIR.
|
|
|
|
|
|
|
|
|
|
### Top Level Structure
|
|
|
|
|
|
|
|
|
|
The top-level structure of the bytecode contains the 4-byte "magic number", a
|
|
|
|
|
version number, a null-terminated producer string, and a list of sections. Each
|
|
|
|
|
section is currently only expected to appear once within a bytecode file.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
bytecode {
|
|
|
|
|
magic: "MLïR",
|
|
|
|
|
version: varint,
|
|
|
|
|
producer: string,
|
|
|
|
|
sections: section[]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### String Section
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
strings {
|
|
|
|
|
numStrings: varint,
|
|
|
|
|
reverseStringLengths: varint[],
|
|
|
|
|
stringData: byte[]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The string section contains a table of strings referenced within the bytecode,
|
|
|
|
|
more easily enabling string sharing. This section is encoded first with the
|
|
|
|
|
total number of strings, followed by the sizes of each of the individual strings
|
|
|
|
|
in reverse order. The remaining encoding contains a single blob containing all
|
|
|
|
|
of the strings concatenated together.
|
|
|
|
|
|
|
|
|
|
### Dialect Section
|
|
|
|
|
|
|
|
|
|
The dialect section of the bytecode contains all of the dialects referenced
|
|
|
|
|
within the encoded IR, and some information about the components of those
|
|
|
|
|
dialects that were also referenced.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
dialect_section {
|
|
|
|
|
numDialects: varint,
|
2024-08-19 10:23:56 +02:00
|
|
|
|
dialectNames: dialect_name_group[],
|
|
|
|
|
opNames: dialect_ops_group[] // ops grouped by dialect
|
2022-08-11 20:06:33 -07:00
|
|
|
|
}
|
|
|
|
|
|
2024-08-19 10:23:56 +02:00
|
|
|
|
dialect_name_group {
|
|
|
|
|
nameAndIsVersioned: varint // (dialectID << 1) | (hasVersion),
|
|
|
|
|
version: dialect_version_section // only if versioned
|
2022-08-11 20:06:33 -07:00
|
|
|
|
}
|
2023-02-14 08:45:08 -08:00
|
|
|
|
|
|
|
|
|
dialect_version_section {
|
|
|
|
|
size: varint,
|
|
|
|
|
version: byte[]
|
|
|
|
|
}
|
|
|
|
|
|
2024-08-19 10:23:56 +02:00
|
|
|
|
dialect_ops_group {
|
|
|
|
|
dialect: varint,
|
|
|
|
|
numOpNames: varint,
|
|
|
|
|
opNames: op_name_group[]
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
op_name_group {
|
|
|
|
|
nameAndIsRegistered: varint // (nameID << 1) | (isRegisteredOp)
|
|
|
|
|
}
|
2022-08-11 20:06:33 -07:00
|
|
|
|
```
|
|
|
|
|
|
2023-02-14 08:45:08 -08:00
|
|
|
|
Dialects are encoded as a `varint` containing the index to the name string
|
|
|
|
|
within the string section, plus a flag indicating whether the dialect is
|
|
|
|
|
versioned. Operation names are encoded in groups by dialect, with each group
|
|
|
|
|
containing the dialect, the number of operation names, and the array of indexes
|
|
|
|
|
to each name within the string section. The version is encoded as a nested
|
2024-08-19 10:23:56 +02:00
|
|
|
|
section for each dialect.
|
2022-08-11 20:06:33 -07:00
|
|
|
|
|
|
|
|
|
### Attribute/Type Sections
|
|
|
|
|
|
|
|
|
|
Attributes and types are encoded using two [sections](#sections), one section
|
|
|
|
|
(`attr_type_section`) containing the actual encoded representation, and another
|
|
|
|
|
section (`attr_type_offset_section`) containing the offsets of each encoded
|
|
|
|
|
attribute/type into the previous section. This structure allows for attributes
|
|
|
|
|
and types to always be lazily loaded on demand.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
attr_type_section {
|
|
|
|
|
attrs: attribute[],
|
2022-10-07 09:05:49 -07:00
|
|
|
|
types: type[]
|
2022-08-11 20:06:33 -07:00
|
|
|
|
}
|
|
|
|
|
attr_type_offset_section {
|
|
|
|
|
numAttrs: varint,
|
|
|
|
|
numTypes: varint,
|
|
|
|
|
offsets: attr_type_offset_group[]
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
attr_type_offset_group {
|
|
|
|
|
dialect: varint,
|
|
|
|
|
numElements: varint,
|
2022-10-07 09:05:49 -07:00
|
|
|
|
offsets: varint[] // (offset << 1) | (hasCustomEncoding)
|
2022-08-11 20:06:33 -07:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
attribute {
|
|
|
|
|
encoding: ...
|
|
|
|
|
}
|
|
|
|
|
type {
|
|
|
|
|
encoding: ...
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Each `offset` in the `attr_type_offset_section` above is the size of the
|
|
|
|
|
encoding for the attribute or type and a flag indicating if the encoding uses
|
|
|
|
|
the textual assembly format, or a custom bytecode encoding. We avoid using the
|
|
|
|
|
direct offset into the `attr_type_section`, as a smaller relative offsets
|
|
|
|
|
provides more effective compression. Attributes and types are grouped by
|
|
|
|
|
dialect, with each `attr_type_offset_group` in the offset section containing the
|
|
|
|
|
corresponding parent dialect, number of elements, and offsets for each element
|
|
|
|
|
within the group.
|
|
|
|
|
|
|
|
|
|
#### Attribute/Type Encodings
|
|
|
|
|
|
|
|
|
|
In the abstract, an attribute/type is encoded in one of two possible ways: via
|
|
|
|
|
its assembly format, or via a custom dialect defined encoding.
|
|
|
|
|
|
|
|
|
|
##### Assembly Format Fallback
|
|
|
|
|
|
|
|
|
|
In the case where a dialect does not define a method for encoding the attribute
|
|
|
|
|
or type, the textual assembly format of that attribute or type is used as a
|
2024-08-19 10:23:56 +02:00
|
|
|
|
fallback. For example, a type `!bytecode.type<42>` would be encoded as the null
|
|
|
|
|
terminated string "!bytecode.type<42>". This ensures that every attribute and
|
|
|
|
|
type can be encoded, even if the owning dialect has not yet opted in to a more
|
2022-08-11 20:06:33 -07:00
|
|
|
|
efficient serialization.
|
|
|
|
|
|
|
|
|
|
TODO: We shouldn't redundantly encode the dialect name here, we should use a
|
|
|
|
|
reference to the parent dialect instead.
|
|
|
|
|
|
|
|
|
|
##### Dialect Defined Encoding
|
|
|
|
|
|
2024-08-19 10:23:56 +02:00
|
|
|
|
As an alternative to the assembly format fallback, dialects may also provide a
|
|
|
|
|
custom encoding for their attributes and types. Custom encodings are very
|
|
|
|
|
beneficial in that they are significantly smaller and faster to read and write.
|
2022-08-23 12:56:02 -07:00
|
|
|
|
|
|
|
|
|
Dialects can opt-in to providing custom encodings by implementing the
|
|
|
|
|
`BytecodeDialectInterface`. This interface provides hooks, namely
|
|
|
|
|
`readAttribute`/`readType` and `writeAttribute`/`writeType`, that will be used
|
|
|
|
|
by the bytecode reader and writer. These hooks are provided a reader and writer
|
|
|
|
|
implementation that can be used to encode various constructs in the underlying
|
|
|
|
|
bytecode format. A unique feature of this interface is that dialects may choose
|
|
|
|
|
to only encode a subset of their attributes and types in a custom bytecode
|
|
|
|
|
format, which can simplify adding new or experimental components that aren't
|
|
|
|
|
fully baked.
|
|
|
|
|
|
|
|
|
|
When implementing the bytecode interface, dialects are responsible for all
|
|
|
|
|
aspects of the encoding. This includes the indicator for which kind of attribute
|
|
|
|
|
or type is being encoded; the bytecode reader will only know that it has
|
|
|
|
|
encountered an attribute or type of a given dialect, it doesn't encode any
|
|
|
|
|
further information. As such, a common encoding idiom is to use a leading
|
|
|
|
|
`varint` code to indicate how the attribute or type was encoded.
|
2022-08-11 20:06:33 -07:00
|
|
|
|
|
2022-09-06 20:47:57 -07:00
|
|
|
|
### Resource Section
|
|
|
|
|
|
|
|
|
|
Resources are encoded using two [sections](#sections), one section
|
|
|
|
|
(`resource_section`) containing the actual encoded representation, and another
|
|
|
|
|
section (`resource_offset_section`) containing the offsets of each encoded
|
|
|
|
|
resource into the previous section.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
resource_section {
|
|
|
|
|
resources: resource[]
|
|
|
|
|
}
|
|
|
|
|
resource {
|
|
|
|
|
value: resource_bool | resource_string | resource_blob
|
|
|
|
|
}
|
|
|
|
|
resource_bool {
|
|
|
|
|
value: byte
|
|
|
|
|
}
|
|
|
|
|
resource_string {
|
|
|
|
|
value: varint
|
|
|
|
|
}
|
|
|
|
|
resource_blob {
|
|
|
|
|
alignment: varint,
|
|
|
|
|
size: varint,
|
|
|
|
|
padding: byte[],
|
|
|
|
|
blob: byte[]
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
resource_offset_section {
|
|
|
|
|
numExternalResourceGroups: varint,
|
|
|
|
|
resourceGroups: resource_group[]
|
|
|
|
|
}
|
|
|
|
|
resource_group {
|
|
|
|
|
key: varint,
|
|
|
|
|
numResources: varint,
|
|
|
|
|
resources: resource_info[]
|
|
|
|
|
}
|
|
|
|
|
resource_info {
|
|
|
|
|
key: varint,
|
|
|
|
|
size: varint
|
|
|
|
|
kind: byte,
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Resources are grouped by the provider, either an external entity or a dialect,
|
|
|
|
|
with each `resource_group` in the offset section containing the corresponding
|
|
|
|
|
provider, number of elements, and info for each element within the group. For
|
|
|
|
|
each element, we record the key, the value kind, and the encoded size. We avoid
|
|
|
|
|
using the direct offset into the `resource_section`, as a smaller relative
|
|
|
|
|
offsets provides more effective compression.
|
|
|
|
|
|
2022-08-11 20:06:33 -07:00
|
|
|
|
### IR Section
|
|
|
|
|
|
|
|
|
|
The IR section contains the encoded form of operations within the bytecode.
|
|
|
|
|
|
2023-04-29 02:36:45 -07:00
|
|
|
|
```
|
|
|
|
|
ir_section {
|
|
|
|
|
block: block; // Single block without arguments.
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2022-08-11 20:06:33 -07:00
|
|
|
|
#### Operation Encoding
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
op {
|
|
|
|
|
name: varint,
|
|
|
|
|
encodingMask: byte,
|
|
|
|
|
location: varint,
|
|
|
|
|
|
|
|
|
|
attrDict: varint?,
|
|
|
|
|
|
|
|
|
|
numResults: varint?,
|
|
|
|
|
resultTypes: varint[],
|
|
|
|
|
|
|
|
|
|
numOperands: varint?,
|
|
|
|
|
operands: varint[],
|
|
|
|
|
|
|
|
|
|
numSuccessors: varint?,
|
|
|
|
|
successors: varint[],
|
|
|
|
|
|
2023-05-21 16:46:59 -07:00
|
|
|
|
numUseListOrders: varint?,
|
|
|
|
|
useListOrders: uselist[],
|
|
|
|
|
|
2022-10-07 09:05:49 -07:00
|
|
|
|
regionEncoding: varint?, // (numRegions << 1) | (isIsolatedFromAbove)
|
2023-04-29 02:36:45 -07:00
|
|
|
|
|
|
|
|
|
// regions are stored in a section if isIsolatedFromAbove
|
|
|
|
|
regions: (region | region_section)[]
|
2022-08-11 20:06:33 -07:00
|
|
|
|
}
|
2023-05-21 16:46:59 -07:00
|
|
|
|
|
|
|
|
|
uselist {
|
|
|
|
|
indexInRange: varint?,
|
|
|
|
|
useListEncoding: varint, // (numIndices << 1) | (isIndexPairEncoding)
|
|
|
|
|
indices: varint[]
|
|
|
|
|
}
|
2022-08-11 20:06:33 -07:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The encoding of an operation is important because this is generally the most
|
|
|
|
|
commonly appearing structure in the bytecode. A single encoding is used for
|
2024-08-19 10:23:56 +02:00
|
|
|
|
every type of operation. Given this prevalence, many of the fields of an
|
2022-08-11 20:06:33 -07:00
|
|
|
|
operation are optional. The `encodingMask` field is a bitmask which indicates
|
|
|
|
|
which of the components of the operation are present.
|
|
|
|
|
|
|
|
|
|
##### Location
|
|
|
|
|
|
|
|
|
|
The location is encoded as the index of the location within the attribute table.
|
|
|
|
|
|
|
|
|
|
##### Attributes
|
|
|
|
|
|
|
|
|
|
If the operation has attribues, the index of the operation attribute dictionary
|
|
|
|
|
within the attribute table is encoded.
|
|
|
|
|
|
|
|
|
|
##### Results
|
|
|
|
|
|
|
|
|
|
If the operation has results, the number of results and the indexes of the
|
|
|
|
|
result types within the type table are encoded.
|
|
|
|
|
|
|
|
|
|
##### Operands
|
|
|
|
|
|
|
|
|
|
If the operation has operands, the number of operands and the value index of
|
|
|
|
|
each operand is encoded. This value index is the relative ordering of the
|
|
|
|
|
definition of that value from the start of the first ancestor isolated region.
|
|
|
|
|
|
|
|
|
|
##### Successors
|
|
|
|
|
|
|
|
|
|
If the operation has successors, the number of successors and the indexes of the
|
|
|
|
|
successor blocks within the parent region are encoded.
|
|
|
|
|
|
2023-05-21 16:46:59 -07:00
|
|
|
|
##### Use-list orders
|
|
|
|
|
|
|
|
|
|
The reference use-list order is assumed to be the reverse of the global
|
|
|
|
|
enumeration of all the op operands that one would obtain with a pre-order walk
|
|
|
|
|
of the IR. This order is naturally obtained by building blocks of operations
|
|
|
|
|
op-by-op. However, some transformations may shuffle the use-lists with respect
|
|
|
|
|
to this reference ordering. If any of the results of the operation have a
|
|
|
|
|
use-list order that is not sorted with respect to the reference use-list order,
|
|
|
|
|
an encoding is emitted such that it is possible to reconstruct such order after
|
|
|
|
|
parsing the bytecode. The encoding represents an index map from the reference
|
|
|
|
|
operand order to the current use-list order. A bit flag is used to detect if
|
|
|
|
|
this encoding is of type index-pair or not. When the bit flag is set to zero,
|
|
|
|
|
the element at `i` represent the position of the use `i` of the reference list
|
|
|
|
|
into the current use-list. When the bit flag is set to `1`, the encoding
|
|
|
|
|
represent index pairs `(i, j)`, which indicate that the use at position `i` of
|
|
|
|
|
the reference list is mapped to position `j` in the current use-list. When only
|
|
|
|
|
less than half of the elements in the current use-list are shuffled with respect
|
|
|
|
|
to the reference use-list, the index-pair encoding is used to reduce the
|
|
|
|
|
bytecode memory requirements.
|
|
|
|
|
|
2022-08-11 20:06:33 -07:00
|
|
|
|
##### Regions
|
|
|
|
|
|
|
|
|
|
If the operation has regions, the number of regions and if the regions are
|
|
|
|
|
isolated from above are encoded together in a single varint. Afterwards, each
|
|
|
|
|
region is encoded inline.
|
|
|
|
|
|
|
|
|
|
#### Region Encoding
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
region {
|
|
|
|
|
numBlocks: varint,
|
|
|
|
|
|
|
|
|
|
numValues: varint?,
|
|
|
|
|
blocks: block[]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
A region is encoded first with the number of blocks within. If the region is
|
|
|
|
|
non-empty, the number of values defined directly within the region are encoded,
|
|
|
|
|
followed by the blocks of the region.
|
|
|
|
|
|
|
|
|
|
#### Block Encoding
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
block {
|
|
|
|
|
encoding: varint, // (numOps << 1) | (hasBlockArgs)
|
|
|
|
|
arguments: block_arguments?, // Optional based on encoding
|
|
|
|
|
ops : op[]
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
block_arguments {
|
|
|
|
|
numArgs: varint?,
|
|
|
|
|
args: block_argument[]
|
2023-05-21 16:46:59 -07:00
|
|
|
|
numUseListOrders: varint?,
|
|
|
|
|
useListOrders: uselist[],
|
2022-08-11 20:06:33 -07:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
block_argument {
|
2023-05-25 09:24:50 -07:00
|
|
|
|
typeAndLocation: varint, // (type << 1) | (hasLocation)
|
|
|
|
|
location: varint? // Optional, else unknown location
|
2022-08-11 20:06:33 -07:00
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
A block is encoded with an array of operations and block arguments. The first
|
|
|
|
|
field is an encoding that combines the number of operations in the block, with a
|
|
|
|
|
flag indicating if the block has arguments.
|
2023-05-21 16:46:59 -07:00
|
|
|
|
|
|
|
|
|
Use-list orders are attached to block arguments similarly to how they are
|
|
|
|
|
attached to operation results.
|