1281 Commits

Author SHA1 Message Date
Tomás Longeri
8163e74e45 [Mosaic:TPU] Add relayout for adding minor implicit dim and relax some offset restrictions on similar shape cast
This factors out some logic from the apply-vector-layout shape cast rule where we insert a minor dimension, relaxes some offset restrictions on it, and uses it for the relayout.

PiperOrigin-RevId: 702993092
2024-12-04 23:13:26 -08:00
Tomás Longeri
101168740e [Mosaic:TPU] Lift offset restrictions on single-row (1, 128) -> (8, 128) 32-bit replicated retiling
PiperOrigin-RevId: 702966495
2024-12-04 21:08:17 -08:00
Naums Mogers
3990e05af7 [Mosaic] Add extra memref_slice verification and a memory space check helper
PiperOrigin-RevId: 702883469
2024-12-04 15:35:02 -08:00
jax authors
fa6585dea1 Merge pull request #25006 from andportnoy:aportnoy/mosaic-gpu-kernel-custom-name
PiperOrigin-RevId: 702772768
2024-12-04 10:26:21 -08:00
Jevin Jiang
9e5edb7015 [Mosaic TPU] Support packed type matmul with arbitrary shapes.
This cl removes all the shape constrains in matmul for all types.

We only need to mask out subelement on contracting dim. Instead of unpacking data and applying masks, we create a VREG-sized i32 "mask" which contains subelement mask info to logical and with target vreg. Through this way, in order to mask sub-elements, each target vreg only needs to apply 1 op (logical_and) instead of 3 ops (unpacking + select + packing).

PiperOrigin-RevId: 702480077
2024-12-03 14:58:42 -08:00
Goran Flegar
c4d19ca83c Integrate Triton up to [9732c047](9732c04701)
PiperOrigin-RevId: 702397897
2024-12-03 10:55:56 -08:00
Andrey Portnoy
7bd81dbe0d [Mosaic GPU] Improve default kernel name and add option to customize
This allows users to distinguish Mosaic GPU kernels from other kernels
when using profiling programs such as Nsight Systems.

The new default behavior is to use `mosaic_gpu_<def_name>_kernel` as
the kernel name, where `<def_name>` is the name of the Mosaic GPU
Python kernel function passed to `as_gpu_kernel` or
`as_torch_gpu_kernel`.

We also add a new `kernel_name` optional argument to `as_gpu_kernel`
and `as_torch_gpu_kernel`. If `kernel_name` is not `None`, the
resulting kernel name is `mosaic_gpu_<kernel_name>_kernel`. This is
useful when the Mosaic GPU Python kernel function is constructed
through metaprogramming so that the final specialized kernel can have
different meaningful names depending on the metaparameters.

Previously the kernel name was always `main_kernel`.
2024-12-02 22:22:11 -05:00
Benjamin Kramer
03b6945ee7 Integrate LLVM at llvm/llvm-project@b214ca82da
Updates LLVM usage to match
[b214ca82daee](https://github.com/llvm/llvm-project/commit/b214ca82daee)

PiperOrigin-RevId: 700689999
2024-11-27 07:08:09 -08:00
Tomás Longeri
7a2070e7da [Mosaic:TPU] Enable broadcast from 1-D vectors
PiperOrigin-RevId: 700592669
2024-11-27 00:27:59 -08:00
David Dunleavy
10fdee34d6 Move tsl/platform/{build_config,build_config_root,rules_cc}.bzl to xla/tsl/platform
PiperOrigin-RevId: 700472724
2024-11-26 15:08:58 -08:00
jax authors
aa05dc0b5c Automated Code Change
PiperOrigin-RevId: 699991540
2024-11-25 08:31:06 -08:00
Jevin Jiang
f899d51535 [Mosaic TPU] Fold sublane offset to indices when storing to untiled ref.
This optimization avoids unnecessary retiling when storing to untiled ref but adds at most one extra store op for sublane offset (since sublane offset is limieted to < VregSlice[0]).

PiperOrigin-RevId: 698896373
2024-11-21 13:29:06 -08:00
Kyle Lucke
f3e7e6829a Remove unneeded dependency from rocm_plugin_extension.
PiperOrigin-RevId: 698872849
2024-11-21 12:18:11 -08:00
Naums Mogers
e72b449089 Reverts c04aec9d525dd2e767495e41b98e82dd79315f37
PiperOrigin-RevId: 698654038
2024-11-20 22:45:46 -08:00
Jevin Jiang
869a53345d [Mosaic TPU] Add bound check for general vector store op.
PiperOrigin-RevId: 698577015
2024-11-20 17:28:04 -08:00
Jake VanderPlas
a4266b5e31 Mention python 3.13 in docs & package metadata 2024-11-20 08:23:19 -08:00
Naums Mogers
6c291d67b7 [Mosaic] Add tpu.log verification on SC
Guards against using formatting and targeting vector subcores on SC.

PiperOrigin-RevId: 698222100
2024-11-19 19:04:29 -08:00
Naums Mogers
c04aec9d52 [Mosaic] Extend tpu.sem_signal with subcore_id
This change:
- Bumps up the version of Mosaic to 4 in `serde.cc`.

- Adds optional `subcore_id` parameter to `tpu.sem_signal` for signalling specific subcores.

- Extends deserialization to correctly parse the older versions of Mosaic without the new parameter `subcore_id` of `tpu.sem_signal`.

PiperOrigin-RevId: 698163836
2024-11-19 15:22:59 -08:00
Naums Mogers
0d36b0b433 [Mosaic] Add target core type parameter to tpu.sem_signal
Adds the optional core type parameter to `tpu.sem_signal` for cross-core signalling.
If the target core type is not provided, the target core type is assumed to be that of the core issuing the signal.
The issuing core type is determined based on the core type annotation of the parent function; if the annotation is not provided, the issuing core type is assumed to be TensorCore.

PiperOrigin-RevId: 698129842
2024-11-19 13:40:13 -08:00
Jevin Jiang
6c31efa3f3 [Mosaic TPU] Add general tpu.vector_store and support masked store.
This cl introduces a general store op called tpu.vector_stores which aims to unify vector::store, tpu::strided_load, vector::masked_store. The tpu.vector_stores should also provide general interface for lowering for both TensorCore and SparseCore.

This cl also adds the support for (dynamic) masked store.

PiperOrigin-RevId: 698067741
2024-11-19 10:33:09 -08:00
jax authors
d397dd9684 Implement lax.pad in Pallas.
PiperOrigin-RevId: 697897093
2024-11-18 23:59:20 -08:00
Jevin Jiang
0fe77bc9f0 [Mosaic TPU] Support relayout for mask vector
We cast i1 vector (mask) to i32 vector before relayout and then cast back to i1 vector (mask) after relayout is finished.

PiperOrigin-RevId: 697823543
2024-11-18 18:07:15 -08:00
Dan Foreman-Mackey
ccb331707e Add a GPU implementation of lax.linalg.eig.
This feature has been in the queue for a long time (see https://github.com/jax-ml/jax/issues/1259), and some folks have found that they can use `pure_callback` to call the CPU version as a workaround. It has recently come up that there can be issues when using `pure_callback` with JAX calls in the body (https://github.com/jax-ml/jax/issues/24255; this should be investigated separately).

This change adds a native solution for computing `lax.linalg.eig` on GPU. By default, this is implemented by calling LAPACK on host directly because this has good performance for small to moderately sized problems (less than about 2048^2). For larger matrices, a GPU-backed implementation based on [MAGMA](https://icl.utk.edu/magma/) can have significantly better performance. (I should note that I haven't done a huge amount of benchmarking yet, but this was the breakeven point used by PyTorch, and I find roughly similar behavior so far.)

We don't want to add MAGMA as a required dependency, but if a user has installed it, JAX can use it when the `jax_gpu_use_magma` configuration variable is set to `"on"`. By default, we try to dlopen `libmagma.so`, but the path to a non-standard installation location can be specified using the `JAX_GPU_MAGMA_PATH` environment variable.

PiperOrigin-RevId: 697631402
2024-11-18 08:11:57 -08:00
jax authors
1471702adc [Mosaic TPU] Support 1D concat: set implicit_dim to kSecondMinor to treat 1D (N,) as (1, N) and then tile it as (1, 128)
PiperOrigin-RevId: 696870258
2024-11-15 06:41:57 -08:00
jax authors
a8464ce761 [Mosaic][TPU] Omit short circuiting of relayout (we should always relayout!) and implement product mismatch case for where we relayout from replicated to offset, and the number of vregs changes.
PiperOrigin-RevId: 696557463
2024-11-14 09:53:25 -08:00
Naums Mogers
c32db46e6c [Mosaic] Add parameter names to tpu.sem_signal and add tests
This CLs adds parameter names to the optional parameters of `tpu.sem_signal` -- `device_id`, `core_id` -- to remove the ambiguity upon deserialization.
Adds LIT tests of signalling on TC with parameter names.

PiperOrigin-RevId: 695875037
2024-11-12 14:37:47 -08:00
Sergei Lebedev
d304025a41 [mosaic_gpu] The profiler now uses FFI calls for creating events and computing elapsed time
PiperOrigin-RevId: 695798787
2024-11-12 11:01:59 -08:00
jax authors
1221da8467 [Mosaic] Fix mask creation for packed sublanes
Unaligned concat used to be f32 only, but implicitly protected via unimplemented support for multi-row-shift in sub32 types. When this was added, we started invoking unaligned concat flow w/ sub32 types, but the masking code that assumed full rows (unpacked types) was no longer sufficient - we need better granularity for these cases. This only affects sublanes, as that is where we pack, we don't have partial lanes.

This CL, as a small benefit, also adds better error messages to the ops involved in lower_to_llo.cc.

PiperOrigin-RevId: 695796095
2024-11-12 10:55:19 -08:00
Dan Foreman-Mackey
21e98b5ce4 Fix overflow error in GPU batched linear algebra kernels.
As reported in https://github.com/jax-ml/jax/issues/24843, our LU decomposition on GPU hits overflow errors when the batch size approaches int32 max. This was caused by an issue in how we were constructing the batched pointers used by cuBLAS.

PiperOrigin-RevId: 695694648
2024-11-12 05:33:49 -08:00
Jevin Jiang
38d062dbee [Mosaic TPU] Support dynamic DMA and ref slice on the 2nd minor when memref is untiled
* Generalize any untiled memref to have tiling (packing, 128)
* Support dynamic index on 2nd minor.
* Support dynamic shape on 2nd minor.

PiperOrigin-RevId: 695516124
2024-11-11 16:14:27 -08:00
Benjamin Chetioui
da89c9e38c [Mosaic GPU] Add base_pointer argument to InitializeBarrierOp.
This corresponds to what's implemented in `BarrierRef`, and ultimately makes it
easier to allocate barriers at a specific address in dynamic shared memory.

PiperOrigin-RevId: 695308297
2024-11-11 06:18:26 -08:00
Dimitar (Mitko) Asenov
d833066a1f [MOSAIC:GPU] Add async_load, async_store, and supporting attributes to the MLIR Mosaic GPU Dialect.
PiperOrigin-RevId: 694643777
2024-11-08 14:34:23 -08:00
jax authors
4d1a1264f0 Merge pull request #24778 from cainmagi:fix-pr-23852
PiperOrigin-RevId: 694565904
2024-11-08 11:00:19 -08:00
Adam Paszke
ce3826d098 [Mosaic GPU] Make sure to free the cloned MLIR module when debugging
We only recently started using this in tests and it has caused ASAN
to report a bunch of leaks.

PiperOrigin-RevId: 694510867
2024-11-08 08:35:10 -08:00
Yuchen Jin
218f763255
(follow-up of PR #23852) add missing typename keyword to work with gcc
This update is a follow-up of PR #23852. In the previous PR, there was one missing place where the `typename` was not added.
2024-11-07 23:55:38 -06:00
Tomás Longeri
04a6652243 [Mosaic] Fix handling of i1 splat constants
PiperOrigin-RevId: 694248723
2024-11-07 14:28:59 -08:00
Tzu-Wei Sung
8b7bcadebe [Mosaic] Fix canonicalize_extract op name.
PiperOrigin-RevId: 694236671
2024-11-07 13:51:52 -08:00
Naums Mogers
3df204a457 [Mosaic] Verify that tpu.sem_wait semaphore rank is zero
Since we only wait on one semaphore, we should enforce this in the verifier.

PiperOrigin-RevId: 693770055
2024-11-06 10:10:15 -08:00
Peter Hawkins
ea1e879577 Include mpmath as a bazel dependency of lax_test.
This test has additional test cases that require mpmath.

PiperOrigin-RevId: 693464078
2024-11-05 13:43:06 -08:00
Sergei Lebedev
34b4787e2e [mosaic_gpu] Check the return code of gpuEventCreate and gpuEventDestroy
PiperOrigin-RevId: 693260326
2024-11-05 01:59:58 -08:00
Benjamin Chetioui
63e59c5fd7 [Mosaic GPU] Ensure that the dialect module can be loaded successfully.
This requires that the file providing the bindings has the same name as the
dialect it defines, since dialect search looks for a module path of the form
`<prefix>.<dialect namespace>`.

PiperOrigin-RevId: 693241875
2024-11-05 00:47:21 -08:00
Praveen Batra
8296f6e0ba [Mosaic] Add extension files for infer/apply vector layout.
PiperOrigin-RevId: 691868278
2024-10-31 11:08:37 -07:00
Praveen Batra
7d9f565647 [Mosaic] Fix some imports.
PiperOrigin-RevId: 691830491
2024-10-31 09:25:34 -07:00
Benjamin Chetioui
c708a04c6e [Mosaic GPU] Add Python bindings for the Mosaic GPU MLIR dialect.
Also start moving the existing C++ tests to Python.

PiperOrigin-RevId: 691729887
2024-10-31 02:47:30 -07:00
Dimitar (Mitko) Asenov
7d504cd95a [MOSAIC:GPU] Extend the mosaic mlir dialect with fragmented layouts.
PiperOrigin-RevId: 691712579
2024-10-31 01:29:22 -07:00
jax authors
5aeffde707 [Mosaic] Extend tpu matmulop to have dimension dims. Add support for batching and simple transposition.
PiperOrigin-RevId: 691706218
2024-10-31 00:59:13 -07:00
Naums Mogers
242e6634ff [Mosaic] Add the core type enum
The new attribute allows differentiating compilation by target core.

PiperOrigin-RevId: 691531726
2024-10-30 13:23:34 -07:00
jax authors
99ea4c1a4a [Fix] Put * packing into reshape no-op condition (Bug in my original CL)
PiperOrigin-RevId: 691476663
2024-10-30 10:47:23 -07:00
jax authors
5ad066eeaa [TPU][Mosaic] Replace tpu lowering (at canonicalization) for repeat with concat (which handles far more cases)
PiperOrigin-RevId: 691192121
2024-10-29 15:57:44 -07:00
Peter Hawkins
bee2bc443a Remove some dead code from gpu_prng.py 2024-10-29 09:29:56 -04:00