rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-24 11:16:07 +00:00

Author	SHA1	Message	Date
Tomás Longeri	8163e74e45	[Mosaic:TPU] Add relayout for adding minor implicit dim and relax some offset restrictions on similar shape cast This factors out some logic from the apply-vector-layout shape cast rule where we insert a minor dimension, relaxes some offset restrictions on it, and uses it for the relayout. PiperOrigin-RevId: 702993092	2024-12-04 23:13:26 -08:00
Tomás Longeri	101168740e	[Mosaic:TPU] Lift offset restrictions on single-row (1, 128) -> (8, 128) 32-bit replicated retiling PiperOrigin-RevId: 702966495	2024-12-04 21:08:17 -08:00
Naums Mogers	3990e05af7	[Mosaic] Add extra memref_slice verification and a memory space check helper PiperOrigin-RevId: 702883469	2024-12-04 15:35:02 -08:00
jax authors	fa6585dea1	Merge pull request #25006 from andportnoy:aportnoy/mosaic-gpu-kernel-custom-name PiperOrigin-RevId: 702772768	2024-12-04 10:26:21 -08:00
Jevin Jiang	9e5edb7015	[Mosaic TPU] Support packed type matmul with arbitrary shapes. This cl removes all the shape constrains in matmul for all types. We only need to mask out subelement on contracting dim. Instead of unpacking data and applying masks, we create a VREG-sized i32 "mask" which contains subelement mask info to logical and with target vreg. Through this way, in order to mask sub-elements, each target vreg only needs to apply 1 op (logical_and) instead of 3 ops (unpacking + select + packing). PiperOrigin-RevId: 702480077	2024-12-03 14:58:42 -08:00
Andrey Portnoy	7bd81dbe0d	[Mosaic GPU] Improve default kernel name and add option to customize This allows users to distinguish Mosaic GPU kernels from other kernels when using profiling programs such as Nsight Systems. The new default behavior is to use `mosaic_gpu_<def_name>_kernel` as the kernel name, where `<def_name>` is the name of the Mosaic GPU Python kernel function passed to `as_gpu_kernel` or `as_torch_gpu_kernel`. We also add a new `kernel_name` optional argument to `as_gpu_kernel` and `as_torch_gpu_kernel`. If `kernel_name` is not `None`, the resulting kernel name is `mosaic_gpu_<kernel_name>_kernel`. This is useful when the Mosaic GPU Python kernel function is constructed through metaprogramming so that the final specialized kernel can have different meaningful names depending on the metaparameters. Previously the kernel name was always `main_kernel`.	2024-12-02 22:22:11 -05:00
Benjamin Kramer	03b6945ee7	Integrate LLVM at llvm/llvm-project@b214ca82da Updates LLVM usage to match [b214ca82daee](https://github.com/llvm/llvm-project/commit/b214ca82daee) PiperOrigin-RevId: 700689999	2024-11-27 07:08:09 -08:00
Tomás Longeri	7a2070e7da	[Mosaic:TPU] Enable broadcast from 1-D vectors PiperOrigin-RevId: 700592669	2024-11-27 00:27:59 -08:00
Jevin Jiang	f899d51535	[Mosaic TPU] Fold sublane offset to indices when storing to untiled ref. This optimization avoids unnecessary retiling when storing to untiled ref but adds at most one extra store op for sublane offset (since sublane offset is limieted to < VregSlice[0]). PiperOrigin-RevId: 698896373	2024-11-21 13:29:06 -08:00
Naums Mogers	e72b449089	Reverts c04aec9d525dd2e767495e41b98e82dd79315f37 PiperOrigin-RevId: 698654038	2024-11-20 22:45:46 -08:00
Jevin Jiang	869a53345d	[Mosaic TPU] Add bound check for general vector store op. PiperOrigin-RevId: 698577015	2024-11-20 17:28:04 -08:00
Naums Mogers	6c291d67b7	[Mosaic] Add `tpu.log` verification on SC Guards against using formatting and targeting vector subcores on SC. PiperOrigin-RevId: 698222100	2024-11-19 19:04:29 -08:00
Naums Mogers	c04aec9d52	[Mosaic] Extend tpu.sem_signal with subcore_id This change: - Bumps up the version of Mosaic to 4 in `serde.cc`. - Adds optional `subcore_id` parameter to `tpu.sem_signal` for signalling specific subcores. - Extends deserialization to correctly parse the older versions of Mosaic without the new parameter `subcore_id` of `tpu.sem_signal`. PiperOrigin-RevId: 698163836	2024-11-19 15:22:59 -08:00
Naums Mogers	0d36b0b433	[Mosaic] Add target core type parameter to tpu.sem_signal Adds the optional core type parameter to `tpu.sem_signal` for cross-core signalling. If the target core type is not provided, the target core type is assumed to be that of the core issuing the signal. The issuing core type is determined based on the core type annotation of the parent function; if the annotation is not provided, the issuing core type is assumed to be TensorCore. PiperOrigin-RevId: 698129842	2024-11-19 13:40:13 -08:00
Jevin Jiang	6c31efa3f3	[Mosaic TPU] Add general tpu.vector_store and support masked store. This cl introduces a general store op called tpu.vector_stores which aims to unify vector::store, tpu::strided_load, vector::masked_store. The tpu.vector_stores should also provide general interface for lowering for both TensorCore and SparseCore. This cl also adds the support for (dynamic) masked store. PiperOrigin-RevId: 698067741	2024-11-19 10:33:09 -08:00
jax authors	d397dd9684	Implement lax.pad in Pallas. PiperOrigin-RevId: 697897093	2024-11-18 23:59:20 -08:00
Jevin Jiang	0fe77bc9f0	[Mosaic TPU] Support relayout for mask vector We cast i1 vector (mask) to i32 vector before relayout and then cast back to i1 vector (mask) after relayout is finished. PiperOrigin-RevId: 697823543	2024-11-18 18:07:15 -08:00
jax authors	1471702adc	[Mosaic TPU] Support 1D concat: set implicit_dim to kSecondMinor to treat 1D (N,) as (1, N) and then tile it as (1, 128) PiperOrigin-RevId: 696870258	2024-11-15 06:41:57 -08:00
jax authors	a8464ce761	[Mosaic][TPU] Omit short circuiting of relayout (we should always relayout!) and implement product mismatch case for where we relayout from replicated to offset, and the number of vregs changes. PiperOrigin-RevId: 696557463	2024-11-14 09:53:25 -08:00
Naums Mogers	c32db46e6c	[Mosaic] Add parameter names to tpu.sem_signal and add tests This CLs adds parameter names to the optional parameters of `tpu.sem_signal` -- `device_id`, `core_id` -- to remove the ambiguity upon deserialization. Adds LIT tests of signalling on TC with parameter names. PiperOrigin-RevId: 695875037	2024-11-12 14:37:47 -08:00
Sergei Lebedev	d304025a41	[mosaic_gpu] The profiler now uses FFI calls for creating events and computing elapsed time PiperOrigin-RevId: 695798787	2024-11-12 11:01:59 -08:00
jax authors	1221da8467	[Mosaic] Fix mask creation for packed sublanes Unaligned concat used to be f32 only, but implicitly protected via unimplemented support for multi-row-shift in sub32 types. When this was added, we started invoking unaligned concat flow w/ sub32 types, but the masking code that assumed full rows (unpacked types) was no longer sufficient - we need better granularity for these cases. This only affects sublanes, as that is where we pack, we don't have partial lanes. This CL, as a small benefit, also adds better error messages to the ops involved in lower_to_llo.cc. PiperOrigin-RevId: 695796095	2024-11-12 10:55:19 -08:00
Jevin Jiang	38d062dbee	[Mosaic TPU] Support dynamic DMA and ref slice on the 2nd minor when memref is untiled * Generalize any untiled memref to have tiling (packing, 128) * Support dynamic index on 2nd minor. * Support dynamic shape on 2nd minor. PiperOrigin-RevId: 695516124	2024-11-11 16:14:27 -08:00
Benjamin Chetioui	da89c9e38c	[Mosaic GPU] Add `base_pointer` argument to `InitializeBarrierOp`. This corresponds to what's implemented in `BarrierRef`, and ultimately makes it easier to allocate barriers at a specific address in dynamic shared memory. PiperOrigin-RevId: 695308297	2024-11-11 06:18:26 -08:00
Dimitar (Mitko) Asenov	d833066a1f	[MOSAIC:GPU] Add `async_load`, `async_store`, and supporting attributes to the MLIR Mosaic GPU Dialect. PiperOrigin-RevId: 694643777	2024-11-08 14:34:23 -08:00
Adam Paszke	ce3826d098	[Mosaic GPU] Make sure to free the cloned MLIR module when debugging We only recently started using this in tests and it has caused ASAN to report a bunch of leaks. PiperOrigin-RevId: 694510867	2024-11-08 08:35:10 -08:00
Tomás Longeri	04a6652243	[Mosaic] Fix handling of i1 splat constants PiperOrigin-RevId: 694248723	2024-11-07 14:28:59 -08:00
Tzu-Wei Sung	8b7bcadebe	[Mosaic] Fix canonicalize_extract op name. PiperOrigin-RevId: 694236671	2024-11-07 13:51:52 -08:00
Naums Mogers	3df204a457	[Mosaic] Verify that tpu.sem_wait semaphore rank is zero Since we only wait on one semaphore, we should enforce this in the verifier. PiperOrigin-RevId: 693770055	2024-11-06 10:10:15 -08:00
Sergei Lebedev	34b4787e2e	[mosaic_gpu] Check the return code of `gpuEventCreate` and `gpuEventDestroy` PiperOrigin-RevId: 693260326	2024-11-05 01:59:58 -08:00
Benjamin Chetioui	63e59c5fd7	[Mosaic GPU] Ensure that the dialect module can be loaded successfully. This requires that the file providing the bindings has the same name as the dialect it defines, since dialect search looks for a module path of the form `<prefix>.<dialect namespace>`. PiperOrigin-RevId: 693241875	2024-11-05 00:47:21 -08:00
Praveen Batra	8296f6e0ba	[Mosaic] Add extension files for infer/apply vector layout. PiperOrigin-RevId: 691868278	2024-10-31 11:08:37 -07:00
Praveen Batra	7d9f565647	[Mosaic] Fix some imports. PiperOrigin-RevId: 691830491	2024-10-31 09:25:34 -07:00
Benjamin Chetioui	c708a04c6e	[Mosaic GPU] Add Python bindings for the Mosaic GPU MLIR dialect. Also start moving the existing C++ tests to Python. PiperOrigin-RevId: 691729887	2024-10-31 02:47:30 -07:00
Dimitar (Mitko) Asenov	7d504cd95a	[MOSAIC:GPU] Extend the mosaic mlir dialect with fragmented layouts. PiperOrigin-RevId: 691712579	2024-10-31 01:29:22 -07:00
jax authors	5aeffde707	[Mosaic] Extend tpu matmulop to have dimension dims. Add support for batching and simple transposition. PiperOrigin-RevId: 691706218	2024-10-31 00:59:13 -07:00
Naums Mogers	242e6634ff	[Mosaic] Add the core type enum The new attribute allows differentiating compilation by target core. PiperOrigin-RevId: 691531726	2024-10-30 13:23:34 -07:00
jax authors	99ea4c1a4a	[Fix] Put * packing into reshape no-op condition (Bug in my original CL) PiperOrigin-RevId: 691476663	2024-10-30 10:47:23 -07:00
jax authors	5ad066eeaa	[TPU][Mosaic] Replace tpu lowering (at canonicalization) for repeat with concat (which handles far more cases) PiperOrigin-RevId: 691192121	2024-10-29 15:57:44 -07:00
jax authors	de68018473	[NFC][Mosaic TPU] Clarify layout comment block PiperOrigin-RevId: 690977672	2024-10-29 05:20:08 -07:00
jax authors	12d26053e3	[TPU][Mosaic] Add support for a no-op reshape where sublane_tiling = 1 and the res_tiled and src_tiled shapes both fill a full vreg (1024) PiperOrigin-RevId: 690796348	2024-10-28 16:57:51 -07:00
Adam Paszke	36c56fa19b	[Pallas:MGPU] Fix flaky debug_print tests Turns out that waiting for the kernel to finish it not enough, since the prints also need to be processed by the CUDA runtime. Using a test-only function that synchronizes all the devices seems to suffice. PiperOrigin-RevId: 690624999	2024-10-28 08:42:02 -07:00
Sergei Lebedev	04bdd07f66	[mosaic_gpu] `mgpu.FragmentedArray` now supports `//` This is needed to compute grid index from the iteration step counter in `emit_pipeline`. PiperOrigin-RevId: 690608581	2024-10-28 07:52:22 -07:00
Jevin Jiang	2a671e25a7	[Mosaic TPU] Remove extra check PiperOrigin-RevId: 689852989	2024-10-25 11:22:17 -07:00
Tzu-Wei Sung	4972f84c94	[Mosaic] Use max sublane offset per shuffled load to decide whether to avoid bank conflict. PiperOrigin-RevId: 689809024	2024-10-25 09:09:14 -07:00
jax authors	63c1699ed0	Fix a use-after-free bug in third_party/py/jax/jaxlib/mosaic/dialect/gpu/mosaic_gpu_test.cc The backing array of the initializer_list is destroyed at the end of the full expression. PiperOrigin-RevId: 689783482	2024-10-25 07:40:12 -07:00
Adam Paszke	6634f5a348	[Mosaic GPU] Use absl::StrCat instead std::string::operator+ Repeated string addition is apparently a bit of an anti-pattern. Not that it matters much in this place, but why not do it properly. PiperOrigin-RevId: 689416587	2024-10-24 09:49:51 -07:00
Andrey Portnoy	14e0f0e7fa	[Mosaic GPU] Query SM and PTX ISA dynamically using driver and LLVM Originally proposed in #24021. Slightly rewritter to make testing with internal LLVM toolchains better. Use CUDA driver API to query major and minor compute capabilities, thus arriving at a "base" SM string (e.g. `sm_90`). Then use LLVM to see if we can "upgrade" the base SM string to one that enables architecture-specific capabilities (e.g. `sm_90a`). Then use LLVM to map the SM string to a PTX ISA version that supports the SM. Co-authored-by: Andrey Portnoy <aportnoy@nvidia.com> PiperOrigin-RevId: 689286774	2024-10-24 01:46:29 -07:00
Jevin Jiang	b8bacda2d9	[Mosaic TPU] Use native vector tiling to load and store with untiled memref. PiperOrigin-RevId: 689142734	2024-10-23 16:22:16 -07:00
jax authors	48bddc6f6c	Adds arith.select to the op patters in order to canonicalize non 32 bit selects. PiperOrigin-RevId: 687635492	2024-10-19 09:09:06 -07:00

1 2 3 4 5 ...

368 Commits