rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-22 23:46:04 +00:00

Author	SHA1	Message	Date
Tomás Longeri	8163e74e45	[Mosaic:TPU] Add relayout for adding minor implicit dim and relax some offset restrictions on similar shape cast This factors out some logic from the apply-vector-layout shape cast rule where we insert a minor dimension, relaxes some offset restrictions on it, and uses it for the relayout. PiperOrigin-RevId: 702993092	2024-12-04 23:13:26 -08:00
Tomás Longeri	101168740e	[Mosaic:TPU] Lift offset restrictions on single-row (1, 128) -> (8, 128) 32-bit replicated retiling PiperOrigin-RevId: 702966495	2024-12-04 21:08:17 -08:00
Naums Mogers	3990e05af7	[Mosaic] Add extra memref_slice verification and a memory space check helper PiperOrigin-RevId: 702883469	2024-12-04 15:35:02 -08:00
jax authors	fa6585dea1	Merge pull request #25006 from andportnoy:aportnoy/mosaic-gpu-kernel-custom-name PiperOrigin-RevId: 702772768	2024-12-04 10:26:21 -08:00
Jevin Jiang	9e5edb7015	[Mosaic TPU] Support packed type matmul with arbitrary shapes. This cl removes all the shape constrains in matmul for all types. We only need to mask out subelement on contracting dim. Instead of unpacking data and applying masks, we create a VREG-sized i32 "mask" which contains subelement mask info to logical and with target vreg. Through this way, in order to mask sub-elements, each target vreg only needs to apply 1 op (logical_and) instead of 3 ops (unpacking + select + packing). PiperOrigin-RevId: 702480077	2024-12-03 14:58:42 -08:00
Goran Flegar	c4d19ca83c	Integrate Triton up to [9732c047](`9732c04701`) PiperOrigin-RevId: 702397897	2024-12-03 10:55:56 -08:00
Andrey Portnoy	7bd81dbe0d	[Mosaic GPU] Improve default kernel name and add option to customize This allows users to distinguish Mosaic GPU kernels from other kernels when using profiling programs such as Nsight Systems. The new default behavior is to use `mosaic_gpu_<def_name>_kernel` as the kernel name, where `<def_name>` is the name of the Mosaic GPU Python kernel function passed to `as_gpu_kernel` or `as_torch_gpu_kernel`. We also add a new `kernel_name` optional argument to `as_gpu_kernel` and `as_torch_gpu_kernel`. If `kernel_name` is not `None`, the resulting kernel name is `mosaic_gpu_<kernel_name>_kernel`. This is useful when the Mosaic GPU Python kernel function is constructed through metaprogramming so that the final specialized kernel can have different meaningful names depending on the metaparameters. Previously the kernel name was always `main_kernel`.	2024-12-02 22:22:11 -05:00
Benjamin Kramer	03b6945ee7	Integrate LLVM at llvm/llvm-project@b214ca82da Updates LLVM usage to match [b214ca82daee](https://github.com/llvm/llvm-project/commit/b214ca82daee) PiperOrigin-RevId: 700689999	2024-11-27 07:08:09 -08:00
Tomás Longeri	7a2070e7da	[Mosaic:TPU] Enable broadcast from 1-D vectors PiperOrigin-RevId: 700592669	2024-11-27 00:27:59 -08:00
David Dunleavy	10fdee34d6	Move `tsl/platform/{build_config,build_config_root,rules_cc}.bzl` to `xla/tsl/platform` PiperOrigin-RevId: 700472724	2024-11-26 15:08:58 -08:00
jax authors	aa05dc0b5c	Automated Code Change PiperOrigin-RevId: 699991540	2024-11-25 08:31:06 -08:00
Jevin Jiang	f899d51535	[Mosaic TPU] Fold sublane offset to indices when storing to untiled ref. This optimization avoids unnecessary retiling when storing to untiled ref but adds at most one extra store op for sublane offset (since sublane offset is limieted to < VregSlice[0]). PiperOrigin-RevId: 698896373	2024-11-21 13:29:06 -08:00
Kyle Lucke	f3e7e6829a	Remove unneeded dependency from rocm_plugin_extension. PiperOrigin-RevId: 698872849	2024-11-21 12:18:11 -08:00
Naums Mogers	e72b449089	Reverts c04aec9d525dd2e767495e41b98e82dd79315f37 PiperOrigin-RevId: 698654038	2024-11-20 22:45:46 -08:00
Jevin Jiang	869a53345d	[Mosaic TPU] Add bound check for general vector store op. PiperOrigin-RevId: 698577015	2024-11-20 17:28:04 -08:00
Jake VanderPlas	a4266b5e31	Mention python 3.13 in docs & package metadata	2024-11-20 08:23:19 -08:00
Naums Mogers	6c291d67b7	[Mosaic] Add `tpu.log` verification on SC Guards against using formatting and targeting vector subcores on SC. PiperOrigin-RevId: 698222100	2024-11-19 19:04:29 -08:00
Naums Mogers	c04aec9d52	[Mosaic] Extend tpu.sem_signal with subcore_id This change: - Bumps up the version of Mosaic to 4 in `serde.cc`. - Adds optional `subcore_id` parameter to `tpu.sem_signal` for signalling specific subcores. - Extends deserialization to correctly parse the older versions of Mosaic without the new parameter `subcore_id` of `tpu.sem_signal`. PiperOrigin-RevId: 698163836	2024-11-19 15:22:59 -08:00
Naums Mogers	0d36b0b433	[Mosaic] Add target core type parameter to tpu.sem_signal Adds the optional core type parameter to `tpu.sem_signal` for cross-core signalling. If the target core type is not provided, the target core type is assumed to be that of the core issuing the signal. The issuing core type is determined based on the core type annotation of the parent function; if the annotation is not provided, the issuing core type is assumed to be TensorCore. PiperOrigin-RevId: 698129842	2024-11-19 13:40:13 -08:00
Jevin Jiang	6c31efa3f3	[Mosaic TPU] Add general tpu.vector_store and support masked store. This cl introduces a general store op called tpu.vector_stores which aims to unify vector::store, tpu::strided_load, vector::masked_store. The tpu.vector_stores should also provide general interface for lowering for both TensorCore and SparseCore. This cl also adds the support for (dynamic) masked store. PiperOrigin-RevId: 698067741	2024-11-19 10:33:09 -08:00
jax authors	d397dd9684	Implement lax.pad in Pallas. PiperOrigin-RevId: 697897093	2024-11-18 23:59:20 -08:00
Jevin Jiang	0fe77bc9f0	[Mosaic TPU] Support relayout for mask vector We cast i1 vector (mask) to i32 vector before relayout and then cast back to i1 vector (mask) after relayout is finished. PiperOrigin-RevId: 697823543	2024-11-18 18:07:15 -08:00
Dan Foreman-Mackey	ccb331707e	Add a GPU implementation of `lax.linalg.eig`. This feature has been in the queue for a long time (see https://github.com/jax-ml/jax/issues/1259), and some folks have found that they can use `pure_callback` to call the CPU version as a workaround. It has recently come up that there can be issues when using `pure_callback` with JAX calls in the body (https://github.com/jax-ml/jax/issues/24255; this should be investigated separately). This change adds a native solution for computing `lax.linalg.eig` on GPU. By default, this is implemented by calling LAPACK on host directly because this has good performance for small to moderately sized problems (less than about 2048^2). For larger matrices, a GPU-backed implementation based on [MAGMA](https://icl.utk.edu/magma/) can have significantly better performance. (I should note that I haven't done a huge amount of benchmarking yet, but this was the breakeven point used by PyTorch, and I find roughly similar behavior so far.) We don't want to add MAGMA as a required dependency, but if a user has installed it, JAX can use it when the `jax_gpu_use_magma` configuration variable is set to `"on"`. By default, we try to dlopen `libmagma.so`, but the path to a non-standard installation location can be specified using the `JAX_GPU_MAGMA_PATH` environment variable. PiperOrigin-RevId: 697631402	2024-11-18 08:11:57 -08:00
jax authors	1471702adc	[Mosaic TPU] Support 1D concat: set implicit_dim to kSecondMinor to treat 1D (N,) as (1, N) and then tile it as (1, 128) PiperOrigin-RevId: 696870258	2024-11-15 06:41:57 -08:00
jax authors	a8464ce761	[Mosaic][TPU] Omit short circuiting of relayout (we should always relayout!) and implement product mismatch case for where we relayout from replicated to offset, and the number of vregs changes. PiperOrigin-RevId: 696557463	2024-11-14 09:53:25 -08:00
Naums Mogers	c32db46e6c	[Mosaic] Add parameter names to tpu.sem_signal and add tests This CLs adds parameter names to the optional parameters of `tpu.sem_signal` -- `device_id`, `core_id` -- to remove the ambiguity upon deserialization. Adds LIT tests of signalling on TC with parameter names. PiperOrigin-RevId: 695875037	2024-11-12 14:37:47 -08:00
Sergei Lebedev	d304025a41	[mosaic_gpu] The profiler now uses FFI calls for creating events and computing elapsed time PiperOrigin-RevId: 695798787	2024-11-12 11:01:59 -08:00
jax authors	1221da8467	[Mosaic] Fix mask creation for packed sublanes Unaligned concat used to be f32 only, but implicitly protected via unimplemented support for multi-row-shift in sub32 types. When this was added, we started invoking unaligned concat flow w/ sub32 types, but the masking code that assumed full rows (unpacked types) was no longer sufficient - we need better granularity for these cases. This only affects sublanes, as that is where we pack, we don't have partial lanes. This CL, as a small benefit, also adds better error messages to the ops involved in lower_to_llo.cc. PiperOrigin-RevId: 695796095	2024-11-12 10:55:19 -08:00
Dan Foreman-Mackey	21e98b5ce4	Fix overflow error in GPU batched linear algebra kernels. As reported in https://github.com/jax-ml/jax/issues/24843, our LU decomposition on GPU hits overflow errors when the batch size approaches int32 max. This was caused by an issue in how we were constructing the batched pointers used by cuBLAS. PiperOrigin-RevId: 695694648	2024-11-12 05:33:49 -08:00
Jevin Jiang	38d062dbee	[Mosaic TPU] Support dynamic DMA and ref slice on the 2nd minor when memref is untiled * Generalize any untiled memref to have tiling (packing, 128) * Support dynamic index on 2nd minor. * Support dynamic shape on 2nd minor. PiperOrigin-RevId: 695516124	2024-11-11 16:14:27 -08:00
Benjamin Chetioui	da89c9e38c	[Mosaic GPU] Add `base_pointer` argument to `InitializeBarrierOp`. This corresponds to what's implemented in `BarrierRef`, and ultimately makes it easier to allocate barriers at a specific address in dynamic shared memory. PiperOrigin-RevId: 695308297	2024-11-11 06:18:26 -08:00
Dimitar (Mitko) Asenov	d833066a1f	[MOSAIC:GPU] Add `async_load`, `async_store`, and supporting attributes to the MLIR Mosaic GPU Dialect. PiperOrigin-RevId: 694643777	2024-11-08 14:34:23 -08:00
jax authors	4d1a1264f0	Merge pull request #24778 from cainmagi:fix-pr-23852 PiperOrigin-RevId: 694565904	2024-11-08 11:00:19 -08:00
Adam Paszke	ce3826d098	[Mosaic GPU] Make sure to free the cloned MLIR module when debugging We only recently started using this in tests and it has caused ASAN to report a bunch of leaks. PiperOrigin-RevId: 694510867	2024-11-08 08:35:10 -08:00
Yuchen Jin	218f763255	(follow-up of PR #23852 ) add missing `typename` keyword to work with `gcc` This update is a follow-up of PR #23852. In the previous PR, there was one missing place where the `typename` was not added.	2024-11-07 23:55:38 -06:00
Tomás Longeri	04a6652243	[Mosaic] Fix handling of i1 splat constants PiperOrigin-RevId: 694248723	2024-11-07 14:28:59 -08:00
Tzu-Wei Sung	8b7bcadebe	[Mosaic] Fix canonicalize_extract op name. PiperOrigin-RevId: 694236671	2024-11-07 13:51:52 -08:00
Naums Mogers	3df204a457	[Mosaic] Verify that tpu.sem_wait semaphore rank is zero Since we only wait on one semaphore, we should enforce this in the verifier. PiperOrigin-RevId: 693770055	2024-11-06 10:10:15 -08:00
Peter Hawkins	ea1e879577	Include mpmath as a bazel dependency of lax_test. This test has additional test cases that require mpmath. PiperOrigin-RevId: 693464078	2024-11-05 13:43:06 -08:00
Sergei Lebedev	34b4787e2e	[mosaic_gpu] Check the return code of `gpuEventCreate` and `gpuEventDestroy` PiperOrigin-RevId: 693260326	2024-11-05 01:59:58 -08:00
Benjamin Chetioui	63e59c5fd7	[Mosaic GPU] Ensure that the dialect module can be loaded successfully. This requires that the file providing the bindings has the same name as the dialect it defines, since dialect search looks for a module path of the form `<prefix>.<dialect namespace>`. PiperOrigin-RevId: 693241875	2024-11-05 00:47:21 -08:00
Praveen Batra	8296f6e0ba	[Mosaic] Add extension files for infer/apply vector layout. PiperOrigin-RevId: 691868278	2024-10-31 11:08:37 -07:00
Praveen Batra	7d9f565647	[Mosaic] Fix some imports. PiperOrigin-RevId: 691830491	2024-10-31 09:25:34 -07:00
Benjamin Chetioui	c708a04c6e	[Mosaic GPU] Add Python bindings for the Mosaic GPU MLIR dialect. Also start moving the existing C++ tests to Python. PiperOrigin-RevId: 691729887	2024-10-31 02:47:30 -07:00
Dimitar (Mitko) Asenov	7d504cd95a	[MOSAIC:GPU] Extend the mosaic mlir dialect with fragmented layouts. PiperOrigin-RevId: 691712579	2024-10-31 01:29:22 -07:00
jax authors	5aeffde707	[Mosaic] Extend tpu matmulop to have dimension dims. Add support for batching and simple transposition. PiperOrigin-RevId: 691706218	2024-10-31 00:59:13 -07:00
Naums Mogers	242e6634ff	[Mosaic] Add the core type enum The new attribute allows differentiating compilation by target core. PiperOrigin-RevId: 691531726	2024-10-30 13:23:34 -07:00
jax authors	99ea4c1a4a	[Fix] Put * packing into reshape no-op condition (Bug in my original CL) PiperOrigin-RevId: 691476663	2024-10-30 10:47:23 -07:00
jax authors	5ad066eeaa	[TPU][Mosaic] Replace tpu lowering (at canonicalization) for repeat with concat (which handles far more cases) PiperOrigin-RevId: 691192121	2024-10-29 15:57:44 -07:00
Peter Hawkins	bee2bc443a	Remove some dead code from gpu_prng.py	2024-10-29 09:29:56 -04:00

1 2 3 4 5 ...

1281 Commits