20 Commits

Author SHA1 Message Date
Yinying Li
a10d67f9fb
[mlir][sparse] Enable explicit and implicit value in sparse encoding (#88975)
1. Explicit value means the non-zero value in a sparse tensor. If
explicitVal is set, then all the non-zero values in the tensor have the
same explicit value. The default value Attribute() indicates that it is
not set.

2. Implicit value means the "zero" value in a sparse tensor. If
implicitVal is set, then the "zero" value in the tensor is equal to the
implicit value. For now, we only support `0` as the implicit value but
it could be extended in the future. The default value Attribute()
indicates that the implicit value is `0` (same type as the tensor
element type).

Example:

```
#CSR = #sparse_tensor.encoding<{
  map = (d0, d1) -> (d0 : dense, d1 : compressed),
  posWidth = 64,
  crdWidth = 64,
  explicitVal = 1 : i64,
  implicitVal = 0 : i64
}>
```

Note: this PR tests that implicitVal could be set to other values as
well. The following PR will add verifier and reject any value that's not
zero for implicitVal.
2024-04-24 16:20:25 -07:00
Peiming Liu
56d58295dd
[mlir][sparse] Introduce batch level format. (#83082) 2024-02-26 16:08:28 -08:00
Yinying Li
e5924d6499
[mlir][sparse] Implement parsing n out of m (#79935)
1. Add parsing methods for block[n, m].
2. Encode n and m with the newly extended 64-bit LevelType enum.
3. Update 2:4 methods names/comments to n:m.
2024-02-08 14:38:42 -05:00
Jie Fu
87ff65b07c [mlir][test] Fix -Wformat in sparse_tensor.c (NFC)
llvm-project/mlir/test/CAPI/sparse_tensor.c:50:43:
error: format specifies type 'unsigned long long' but the argument has type 'MlirSparseTensorLevelType' (aka 'unsigned long') [-Werror,-Wformat]
    fprintf(stderr, "level_type: %llu\n", lvlTypes[l]);
                                 ~~~~     ^~~~~~~~~~~
                                 %lu
1 error generated.
2024-02-06 10:50:34 +08:00
Jie Fu
62838b872f [mlir][test] Fix -Wformat in sparse_tensor.c (NFC)
llvm-project/mlir/test/CAPI/sparse_tensor.c:50:42:
error: format specifies type 'unsigned long' but the argument has type 'MlirSparseTensorLevelType' (aka 'unsigned long long') [-Werror,-Wformat]
   50 |     fprintf(stderr, "level_type: %lu\n", lvlTypes[l]);
      |                                  ~~~     ^~~~~~~~~~~
      |                                  %llu
1 error generated.
2024-02-06 10:11:10 +08:00
Yinying Li
cd481fa827
[mlir][sparse] Change LevelType enum to 64 bit (#80501)
1. C++ enum is set through enum class LevelType : uint_64.
2. C enum is set through typedef uint_64 level_type. It is due to the
limitations in Windows build: setting enum width to ui64 is not
supported in C.
2024-02-05 17:00:52 -05:00
Aart Bik
1944c4f76b
[mlir][sparse] rename DimLevelType to LevelType (#73561)
The "Dim" prefix is a legacy left-over that no longer makes sense, since
we have a very strict "Dimension" vs. "Level" definition for sparse
tensor types and their storage.
2023-11-27 14:27:52 -08:00
Yinying Li
d4088e7d5f
[mlir][sparse] Populate lvlToDim (#68937)
Updates:
1. Infer lvlToDim from dimToLvl
2. Add more tests for block sparsity
3. Finish TODOs related to lvlToDim, including adding lvlToDim to python
binding

Verification of lvlToDim that user provides will be implemented in the
next PR.
2023-10-17 16:09:39 -04:00
Yinying Li
256ac4619b
[mlir][sparse] Change tests to use new syntax for ELL and slice (#67569)
Examples:

1. `#ELL = #sparse_tensor.encoding<{ lvlTypes = [ "dense", "dense",
"compressed" ], dimToLvl = affine_map<(i,j)[c] -> (c*4*i, i, j)>
}>`
to
`#ELL = #sparse_tensor.encoding<{ map = [s0](d0, d1) -> (d0 * (s0 * 4) :
dense, d0 : dense, d1 : compressed)
}>`

2. `#CSR_SLICE = #sparse_tensor.encoding<{ lvlTypes = [ "dense",
"compressed" ], dimSlices = [ (1, 4, 1), (1, 4, 2) ]
}>`
to
`#CSR_SLICE = #sparse_tensor.encoding<{ map = (d0 :
#sparse_tensor<slice(1, 4, 1)>, d1 : #sparse_tensor<slice(1, 4, 2)>) ->
(d0 : dense, d1 : compressed)
}>`
2023-09-27 19:40:52 -04:00
Aart Bik
836411b99f
[mlir][sparse] add lvlToDim field to sparse tensor encoding (#67194)
Note the new surface syntax allows for defining a dimToLvl and lvlToDim
map at once (where usually the latter can be inferred from the former,
but not always). This revision adds storage for the latter, together
with some intial boilerplate. The actual support (inference, validation,
printing, etc.) is still TBD of course.
2023-09-22 15:51:25 -07:00
wren romano
76647fce13 [mlir][sparse] Combining dimOrdering+higherOrdering fields into dimToLvl
This is a major step along the way towards the new STEA design.  While a great deal of this patch is simple renaming, there are several significant changes as well.  I've done my best to ensure that this patch retains the previous behavior and error-conditions, even though those are at odds with the eventual intended semantics of the `dimToLvl` mapping.  Since the majority of the compiler does not yet support non-permutations, I've also added explicit assertions in places that previously had implicitly assumed it was dealing with permutations.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151505
2023-05-30 15:19:50 -07:00
wren romano
a0615d020a [mlir][sparse] Renaming the STEA field dimLevelType to lvlTypes
This commit is part of the migration of towards the new STEA syntax/design.  In particular, this commit includes the following changes:
* Renaming compiler-internal functions/methods:
  * `SparseTensorEncodingAttr::{getDimLevelType => getLvlTypes}`
  * `Merger::{getDimLevelType => getLvlType}` (for consistency)
  * `sparse_tensor::{getDimLevelType => buildLevelType}` (to help reduce confusion vs actual getter methods)
* Renaming external facets to match:
  * the STEA parser and printer
  * the C and Python bindings
  * PyTACO

However, the actual renaming of the `DimLevelType` itself (along with all the "dlt" names) will be handled in a separate commit.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D150330
2023-05-17 14:24:09 -07:00
wren romano
84cd51bb97 [mlir][sparse] Renaming "pointer/index" to "position/coordinate"
The old "pointer/index" names often cause confusion since these names clash with names of unrelated things in MLIR; so this change rectifies this by changing everything to use "position/coordinate" terminology instead.

In addition to the basic terminology, there have also been various conventions for making certain distinctions like: (1) the overall storage for coordinates in the sparse-tensor, vs the particular collection of coordinates of a given element; and (2) particular coordinates given as a `Value` or `TypedValue<MemRefType>`, vs particular coordinates given as `ValueRange` or similar.  I have striven to maintain these distinctions
as follows:

  * "p/c" are used for individual position/coordinate values, when there is no risk of confusion.  (Just like we use "d/l" to abbreviate "dim/lvl".)

  * "pos/crd" are used for individual position/coordinate values, when a longer name is helpful to avoid ambiguity or to form compound names (e.g., "parentPos").  (Just like we use "dim/lvl" when we need a longer form of "d/l".)

    I have also used these forms for a handful of compound names where the old name had been using a three-letter form previously, even though a longer form would be more appropriate.  I've avoided renaming these to use a longer form purely for expediency sake, since changing them would require a cascade of other renamings.  They should be updated to follow the new naming scheme, but that can be done in future patches.

  * "coords" is used for the complete collection of crd values associated with a single element.  In the runtime library this includes both `std::vector` and raw pointer representations.  In the compiler, this is used specifically for buffer variables with C++ type `Value`, `TypedValue<MemRefType>`, etc.

    The bare form "coords" is discouraged, since it fails to make the dim/lvl distinction; so the compound names "dimCoords/lvlCoords" should be used instead.  (Though there may exist a rare few cases where is is appropriate to be intentionally ambiguous about what coordinate-space the coords live in; in which case the bare "coords" is appropriate.)

    There is seldom the need for the pos variant of this notion.  In most circumstances we use the term "cursor", since the same buffer is reused for a 'moving' pos-collection.

  * "dcvs/lcvs" is used in the compiler as the `ValueRange` analogue of "dimCoords/lvlCoords".  (The "vs" stands for "`Value`s".)  I haven't found the need for it, but "pvs" would be the obvious name for a pos-`ValueRange`.

    The old "ind"-vs-"ivs" naming scheme does not seem to have been sustained in more recent code, which instead prefers other mnemonics (e.g., adding "Buf" to the end of the names for `TypeValue<MemRefType>`).  I have cleaned up a lot of these to follow the "coords"-vs-"cvs" naming scheme, though haven't done an exhaustive cleanup.

  * "positions/coordinates" are used for larger collections of pos/crd values; in particular, these are used when referring to the complete sparse-tensor storage components.

    I also prefer to use these unabbreviated names in the documentation, unless there is some specific reason why using the abbreviated forms helps resolve ambiguity.

In addition to making this terminology change, this change also does some cleanup along the way:
  * correcting the dim/lvl terminology in certain places.
  * adding `const` when it requires no other code changes.
  * miscellaneous cleanup that was entailed in order to make the proper distinctions.  Most of these are in CodegenUtils.{h,cpp}

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D144773
2023-03-06 12:23:33 -08:00
Tom Eccles
5d91f79fce [mlir] Fix -Wstrict-prototypes warning
These warnings prevent compilation using clang and
-DLLVM_ENABLE_WERROR=On.

Differential revision: https://reviews.llvm.org/D139322
2022-12-12 12:04:58 +00:00
wren romano
933fefb6a8 [mlir][sparse] Adjusting DimLevelType numeric values for faster predicates
This differential adjusts the numeric values for DimLevelType values: using the low-order two bits for recording the "No" and "Nu" properties, and the high-order bits for the formats per se.  (The choice of encoding may seem a bit peculiar, since the bits are mapped to negative properties rather than positive properties.  But this was done in order to preserve the collation order of DimLevelType values.  If we don't care about collation order, then we may prefer to flip the semantics of the property bits, so that they're less surprising to readers.)

Using distinguished bits for the properties and formats enables faster implementation for the predicates detecting those properties/formats, which matters because this is in the runtime library itself (rather than on the codegen side of things).  This differential pushes through the changes to the enum values, and optimizes the basic predicates.  However it does not optimize all the places where we check compound predicates (e.g., "is compressed or singleton"), to help reduce rebasing conflict with D134933.  Those optimizations will be done after this differential and D134933 are landed.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D135004
2022-10-05 17:40:38 -07:00
Aart Bik
c48e90877f [mlir][sparse] introduce a higher-order tensor mapping
This extension to the sparse tensor type system in MLIR
opens up a whole new set of sparse storage schemes, such as
block sparse storage (e.g. BCSR) and ELL (aka jagged diagonals).

This revision merely introduces the type extension and
initial documentation. The actual interpretation of the type
(reading in tensors, lowering to code, etc.) will follow.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D135206
2022-10-05 09:40:51 -07:00
Aart Bik
9921ef73c8 [mlir][sparse] remove singleton dimension level type (for now)
Although we have plans to support this, and many other, dimension level type(s), currently the tag is not supported. It will be easy to add this back once support is added.

NOTE: based on discussion in https://discourse.llvm.org/t/overcoming-sparsification-limitation-on-level-types/62585

https://github.com/llvm/llvm-project/issues/51658

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D131002
2022-08-02 11:48:49 -07:00
Stella Laurenzo
5e83a5b475 [mlir] Overhaul C/Python registration APIs to properly scope registration/loading activities.
Since the very first commits, the Python and C MLIR APIs have had mis-placed registration/load functionality for dialects, extensions, etc. This was done pragmatically in order to get bootstrapped and then just grew in. Downstreams largely bypass and do their own thing by providing various APIs to register things they need. Meanwhile, the C++ APIs have stabilized around this and it would make sense to follow suit.

The thing we have observed in canonical usage by downstreams is that each downstream tends to have native entry points that configure its installation to its preferences with one-stop APIs. This patch leans in to this approach with `RegisterEverything.h` and `mlir._mlir_libs._mlirRegisterEverything` being the one-stop entry points for the "upstream packages". The `_mlir_libs.__init__.py` now allows customization of the environment and Context by adding "initialization modules" to the `_mlir_libs` package. If present, `_mlirRegisterEverything` is treated as such a module. Others can be added by downstreams by adding a `_site_initialize_{i}.py` module, where '{i}' is a number starting with zero. The number will be incremented and corresponding module loaded until one is not found. Initialization modules can:

* Perform load time customization to the global environment (i.e. registering passes, hooks, etc).
* Define a `register_dialects(registry: DialectRegistry)` function that can extend the `DialectRegistry` that will be used to bootstrap the `Context`.
* Define a `context_init_hook(context: Context)` function that will be added to a list of callbacks which will be invoked after dialect registration during `Context` initialization.

Note that the `MLIRPythonExtension.RegisterEverything` is not included by default when building a downstream (its corresponding behavior was prior). For downstreams which need the default MLIR initialization to take place, they must add this back in to their Python CMake build just like they add their own components (i.e. to `add_mlir_python_common_capi_library` and `add_mlir_python_modules`). It is perfectly valid to not do this, in which case, only the things explicitly depended on and initialized by downstreams will be built/packaged. If the downstream has not been set up for this, it is recommended to simply add this back for the time being and pay the build time/package size cost.

CMake changes:
* `MLIRCAPIRegistration` -> `MLIRCAPIRegisterEverything` (renamed to signify what it does and force an evaluation: a number of places were incidentally linking this very expensive target)
* `MLIRPythonSoure.Passes` removed (without replacement: just drop)
* `MLIRPythonExtension.AllPassesRegistration` removed (without replacement: just drop)
* `MLIRPythonExtension.Conversions` removed (without replacement: just drop)
* `MLIRPythonExtension.Transforms` removed (without replacement: just drop)

Header changes:
* `mlir-c/Registration.h` is deleted. Dialect registration functionality is now in `IR.h`. Registration of upstream features are in `mlir-c/RegisterEverything.h`. When updating MLIR and a couple of downstreams, I found that proper usage was commingled so required making a choice vs just blind S&R.

Python APIs removed:
  * mlir.transforms and mlir.conversions (previously only had an __init__.py which indirectly triggered `mlirRegisterTransformsPasses()` and `mlirRegisterConversionPasses()` respectively). Downstream impact: Remove these imports if present (they now happen as part of default initialization).
  * mlir._mlir_libs._all_passes_registration, mlir._mlir_libs._mlirTransforms, mlir._mlir_libs._mlirConversions. Downstream impact: None expected (these were internally used).

C-APIs changed:
  * mlirRegisterAllDialects(MlirContext) now takes an MlirDialectRegistry instead. It also used to trigger loading of all dialects, which was already marked with a TODO to remove -- it no longer does, and for direct use, dialects must be explicitly loaded. Downstream impact: Direct C-API users must ensure that needed dialects are loaded or call `mlirContextLoadAllAvailableDialects(MlirContext)` to emulate the prior behavior. Also see the `ir.c` test case (e.g. `  mlirContextGetOrLoadDialect(ctx, mlirStringRefCreateFromCString("func"));`).
  * mlirDialectHandle* APIs were moved from Registration.h (which now is restricted to just global/upstream registration) to IR.h, arguably where it should have been. Downstream impact: include correct header (likely already doing so).

C-APIs added:
  * mlirContextLoadAllAvailableDialects(MlirContext): Corresponds to C++ API with the same purpose.

Python APIs added:
  * mlir.ir.DialectRegistry: Mapping for an MlirDialectRegistry.
  * mlir.ir.Context.append_dialect_registry(MlirDialectRegistry)
  * mlir.ir.Context.load_all_available_dialects()
  * mlir._mlir_libs._mlirAllRegistration: New native extension that exposes a `register_dialects(MlirDialectRegistry)` entry point and performs all upstream pass/conversion/transforms registration on init. In this first step, we eagerly load this as part of the __init__.py and use it to monkey patch the Context to emulate prior behavior.
  * Type caster and capsule support for MlirDialectRegistry

This should make it possible to build downstream Python dialects that only depend on a subset of MLIR. See: https://github.com/llvm/llvm-project/issues/56037

Here is an example PR, minimally adapting IREE to these changes: https://github.com/iree-org/iree/pull/9638/files In this situation, IREE is opting to not link everything, since it is already configuring the Context to its liking. For projects that would just like to not think about it and pull in everything, add `MLIRPythonExtension.RegisterEverything` to the list of Python sources getting built, and the old behavior will continue.

Reviewed By: mehdi_amini, ftynse

Differential Revision: https://reviews.llvm.org/D128593
2022-07-16 17:27:50 -07:00
Stella Laurenzo
295087644a [mlir] Fix windows build bot break due to use of alloca in a test.
Differential Revision: https://reviews.llvm.org/D102189
2021-05-10 20:39:16 +00:00
Stella Laurenzo
bcfa7baec8 [mlir][CAPI] Add CAPI bindings for the sparse_tensor dialect.
* Adds dialect registration, hand coded 'encoding' attribute and test.
* An MLIR CAPI tablegen backend for attributes does not exist, and this is a relatively complicated case. I opted to hand code it in a canonical way for now, which will provide a reasonable blueprint for building out the tablegen version in the future.
* Also added a (local) CMake function for declaring new CAPI tests, since it was getting repetitive/buggy.

Differential Revision: https://reviews.llvm.org/D102141
2021-05-10 16:54:56 +00:00