Jevin Jiang
d8b9211359
[Mosaic TPU] Support dynamic gather along axis 0 or 1 for 32-bit vreg-sized vector.
...
PiperOrigin-RevId: 721980453
2025-01-31 18:47:25 -08:00
Jevin Jiang
785a63ad0f
[Mosaic TPU] Support non-32 bit mask relayout
...
PiperOrigin-RevId: 721552594
2025-01-30 16:13:23 -08:00
Tzu-Wei Sung
d4758b6d5e
[Mosaic][NFC] Factor out xla-array related utils in a separate file.
...
Also added tests.
PiperOrigin-RevId: 721424194
2025-01-30 09:49:41 -08:00
Benjamin Chetioui
d8f3b33ae4
[Mosaic GPU] Eliminate the arrive
attribute from mosaic_gpu.async_load
.
...
We plan to explicitly issue an `expect_tx` operation all the time when using
the dialect.
PiperOrigin-RevId: 721411949
2025-01-30 09:08:45 -08:00
Dimitar (Mitko) Asenov
6214c25a6d
[Mosaic GPU] Add ArriveExpect and Wait ops on dialect barriers with explicit handling of parities
...
This makes dialect tests in mgpu_test.py truly express the entire computation at the warpgroup level.
PiperOrigin-RevId: 721371327
2025-01-30 06:44:32 -08:00
Adam Paszke
29b658b358
[Mosaic TPU] Optimize clipping impelmentation in arith.fptosi
...
We can use maxf/minf to avoid extra comparisons
PiperOrigin-RevId: 720601304
2025-01-28 09:20:16 -08:00
Dimitar (Mitko) Asenov
a3a285dddc
[Mosaic GPU] Handle the swizzle
attribute in the lowering of async_store
and async_load
...
PiperOrigin-RevId: 720129408
2025-01-27 05:18:16 -08:00
Sergei Lebedev
9ee7123c39
[mosaic_gpu] Fixed mosaic_gpu-serde pass registration
...
We previously registered the pass in the :_mosaic_gpu_ext which didn't work
because the extension has its own pass registry. The fix instead is to move
the registration to :register_jax_dialects in jaxlib.
PiperOrigin-RevId: 719280601
2025-01-24 06:35:54 -08:00
Adam Paszke
7043b852ec
[Mosaic GPU] Add basic support for TMA with sub-byte types
...
PiperOrigin-RevId: 719240287
2025-01-24 03:54:12 -08:00
Jevin Jiang
8e1f956804
[Mosaic TPU] Use vmask pack if possible for mask's bitwidth change and introduce relayout op.
...
PiperOrigin-RevId: 719089676
2025-01-23 18:15:08 -08:00
Dimitar (Mitko) Asenov
f57d603c45
[Mosaic GPU] Simplify enums in the MLIR Mosaic GPU dialect.
...
This enables us to use them more simply in the current and upcoming Python code. The Python bindings for enum and enum attributes leave much to be desired.
PiperOrigin-RevId: 718795667
2025-01-23 03:38:26 -08:00
Dimitar (Mitko) Asenov
6b747b4109
[Mosaic GPU] Add a result to the WGMMA op definition in the MLIR dialect
...
PiperOrigin-RevId: 718788390
2025-01-23 03:10:07 -08:00
jax authors
6c76cc4e36
Integrate LLVM at llvm/llvm-project@d33e33fde7
...
Updates LLVM usage to match
[d33e33fde770](https://github.com/llvm/llvm-project/commit/d33e33fde770 )
PiperOrigin-RevId: 718414171
2025-01-22 09:22:07 -08:00
jax authors
54bb7f5ddb
Remove meaningless template keywords.
...
This will fix -Wmissing-template-arg-list-after-template-kw warnings.
This warning is error-by-default in Clang.
PiperOrigin-RevId: 718133601
2025-01-21 17:22:04 -08:00
Dimitar (Mitko) Asenov
f89accc56a
[Mosaic GPU] Add support for converting all fragmented layouts to ir and back.
...
This will be used in the layout inference and lowering of the dialect WGMMA op
PiperOrigin-RevId: 717836648
2025-01-21 03:27:03 -08:00
Adam Paszke
543dd94762
[Mosaic TPU] Add a faster implementation for packing b16 to s8 in TPUv6
...
PiperOrigin-RevId: 717583425
2025-01-20 11:18:22 -08:00
Peter Hawkins
034e967e11
Remove CUDA rpaths from jaxlib build.
...
These are also set in the TSL build rules as part of the CUDA stub libraries, which these libraries depend on, so these copies of the rpath settings are redundant.
PiperOrigin-RevId: 716844265
2025-01-17 17:09:30 -08:00
jax authors
a527aba646
Reverts f1b894d14a28ac22a037fb79177b991275c75a18
...
PiperOrigin-RevId: 716653711
2025-01-17 07:00:31 -08:00
Benjamin Chetioui
d3be190efb
[Mosaic GPU] Delete unused declarations of mosaic_gpu_memcpy_async_h2d
.
...
PiperOrigin-RevId: 716616807
2025-01-17 04:34:48 -08:00
Sergei Lebedev
d34c40f6b6
[mosaic_gpu] Added a serialization pass
...
The pass adds versioning to the Mosaic GPU IR in the lowered custom calls
and can apply forward/backward migration rules. Currently, no rules are
necessary since we are at version 1.
PiperOrigin-RevId: 716596848
2025-01-17 03:12:51 -08:00
Adam Paszke
bd22bfef71
[Mosaic TPU] Use large to compact 2nd minor retiling for conversions going both ways
...
This specific retiling is its own inverse and it faster than alternatives.
PiperOrigin-RevId: 716360070
2025-01-16 13:35:26 -08:00
Tzu-Wei Sung
5c020ee317
[Mosaic] Fix infer/apply extensions.
...
1. For apply, llvm::StringMap()::insert(MapEntryTy*) will cause dangling reference if not constructing mlir::tpu::extensions::rules() with const-reference. However, if we do construct it with const-reference, the signature is not const-qualified and fails to compile. Hence, change it to llvm::StringMap()::insert(std::pair<...>) and get extension rules by const-reference.
2. Pass default tiling to infer rule, we need it to infer single op. See infer of tpu::MatmulOp.
PiperOrigin-RevId: 716274818
2025-01-16 09:57:14 -08:00
Sergei Lebedev
4221f109d1
[mosaic] Extracted serialization pass traversal logic into a reusable function
...
I will use it to implement Mosaic GPU serialization pass in a follow up.
PiperOrigin-RevId: 716156650
2025-01-16 02:58:06 -08:00
Tzu-Wei Sung
4a9cc9ffc1
[Mosaic] Allow passing ApplyVectorLayoutCtx
to tpu.apply_layout_op.
...
To make it the same with C++ API. While I'm here, fix a bug in test_concatenate.
PiperOrigin-RevId: 716016244
2025-01-15 17:47:36 -08:00
Naums Mogers
d3ba1eb339
[Mosaic] Add a macro to convert abseil StatusOr to LLVM FailureOr
...
PiperOrigin-RevId: 715943314
2025-01-15 14:19:29 -08:00
George Necula
f1b894d14a
Reverts 391bad8ff59c07c8fad7b8ce05cd0e29dee4cf1a
...
PiperOrigin-RevId: 715435319
2025-01-14 10:31:59 -08:00
Ayaka
9ba1fd2801
[Pallas TPU] Add vector support to pl.debug_print
...
PiperOrigin-RevId: 715085454
2025-01-13 13:22:21 -08:00
Adam Paszke
391bad8ff5
[Mosaic TPU] Add support for arith.fptosi with non-32bit source and target types
...
This effectively moves some of the Pallas logic to the layer below.
PiperOrigin-RevId: 714965374
2025-01-13 07:49:13 -08:00
Tomás Longeri
7852045582
[Mosaic TPU] Enable non-sublane-aligned bf16 2D load/stores for earlier TPU gens
...
It is still not efficiently implemented, this is mostly to clean up some logic. We may be able to fuse the creation of masks for different tiles into the creation of a single one. But this is also a problem for the later gens.
This also cleans up an unreachable return statement.
PiperOrigin-RevId: 714847066
2025-01-12 23:58:40 -08:00
Tomás Longeri
0930289997
[Mosaic TPU][NFC] Remove redundant num_subelems attribute from CreateSubelementMaskOp
...
PiperOrigin-RevId: 714795856
2025-01-12 19:34:25 -08:00
jax authors
a16fbffc13
[Mosaic][TPU] Add a compatibility mode to Mosaic's canonicalization pass, skipping over elementwise and matmul op insertions and/or type compat casts.
...
PiperOrigin-RevId: 714132282
2025-01-10 12:12:54 -08:00
Adam Paszke
d2a5e8d072
[Mosaic TPU] Add support for integer truncation from packed types
...
PiperOrigin-RevId: 714048232
2025-01-10 07:40:55 -08:00
Adam Paszke
07f4fd3e51
[Mosaic TPU] Fix a bug in the impl of sublane broadcasts for int8 and int4
...
PiperOrigin-RevId: 713675029
2025-01-09 08:05:25 -08:00
Adam Paszke
f96339be1e
[Mosaic TPU] Be much more aggressive in inferring large 2nd minor layouts for 16-bit types on v6
...
This often lets us avoid ambiguities between selecting the (8, 128) and (16, 128) tiling,
by biasing the layout inference to prefer the latter.
PiperOrigin-RevId: 713270421
2025-01-08 06:30:36 -08:00
Adam Paszke
5fd1b2f825
[Mosaic TPU] Add support for second minor broadcasts with packed types
...
PiperOrigin-RevId: 713259707
2025-01-08 05:45:02 -08:00
Adam Paszke
e954930eaf
[Mosaic TPU] Add support for true divide in bf16 on TPUv6
...
PiperOrigin-RevId: 713247480
2025-01-08 04:49:22 -08:00
Tzu-Wei Sung
bf94389b08
[Mosaic] Use tpu::CreateMask for getX32VmaskByPaddingEnd.
...
It was cmp + iota before.
PiperOrigin-RevId: 713240888
2025-01-08 04:18:53 -08:00
Sharad Vikram
4caa263a94
[Mosaic TPU] Add some elementwise canonicalizations
...
PiperOrigin-RevId: 712671502
2025-01-06 15:10:02 -08:00
Peter Hawkins
90d8f37863
Rename pybind_extension to nanobind_extension.
...
We have no remaining uses of pybind11 outside a GPU custom call example.
PiperOrigin-RevId: 712608834
2025-01-06 11:53:44 -08:00
Jevin Jiang
9f842909ce
[Mosaic TPU] Validate inserted layout in relayout-insertion pass.
...
PiperOrigin-RevId: 712595778
2025-01-06 11:15:47 -08:00
Tzu-Wei Sung
57b21541a2
[Mosaic] NFC: Pull out vreg related functions to util.
...
These functions are related to vreg manipulation and are used in different rules.
PiperOrigin-RevId: 711484002
2025-01-02 11:50:19 -08:00
jax authors
68483b8ed6
Merge pull request #25710 from apaszke:mgpu_dialect_fix
...
PiperOrigin-RevId: 711430610
2025-01-02 08:23:28 -08:00
Adam Paszke
64433435ff
Fix OSS build for the Mosaic GPU dialect
2025-01-02 15:55:03 +00:00
Tomás Longeri
ac817b48ca
[Mosaic:TPU][NFC] Clean up unused variable
...
PiperOrigin-RevId: 711412888
2025-01-02 06:57:38 -08:00
Tomás Longeri
4452960947
[Mosaic:TPU] In infer ext rule, avoid assigning offsets outside of dst first tile
...
Note that offsets outside of first tile are still disabled (for both infer and apply), and once we support it we will want to assign offsets differently, this is mostly to avoid assigning invalid layouts (that may not just be outside the first tile, but outside the vreg slice)
PiperOrigin-RevId: 709168368
2024-12-23 15:49:39 -08:00
Tomás Longeri
3c79b98cd9
[Mosaic:TPU] Vreg-slice-aligned offset changes with scratch retiling
...
PiperOrigin-RevId: 709133729
2024-12-23 13:05:14 -08:00
Sergei Lebedev
68ec202d45
Use the right include for gmock and gtest
...
PiperOrigin-RevId: 709058082
2024-12-23 07:34:36 -08:00
Sergei Lebedev
8987867faa
[mosaic_gpu] Include Mosaic GPU dialect fiels into jaxlib
2024-12-23 13:46:25 +00:00
Tomás Longeri
7ecc947184
[Mosaic:TPU] Roll forward of cl/708011538 (expanded trunc support), minus changes in infer-vector-layout
...
We can enable them later but at least this way the support is available to build on
(e.g. in the new insert relayouts pass)
Reverts 05f3a701e769748ff1ec51d50324a3595c4aff0d
PiperOrigin-RevId: 708397219
2024-12-20 12:33:30 -08:00
Tomás Longeri
05f3a701e7
[Mosaic:TPU] Roll back cl/708011538 and cl/708112341
...
Reverts 307c8d3af81f16142fd4c64f501b05a5b69f815e
PiperOrigin-RevId: 708173083
2024-12-19 21:51:44 -08:00