rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-24 16:46:06 +00:00

Author	SHA1	Message	Date
Sergei Lebedev	64230d1c93	[pallas:mosaic_gpu] WG lowering now supports `while_p` PiperOrigin-RevId: 736996154	2025-03-14 14:59:29 -07:00
Tzu-Wei Sung	21f5f2d45e	[Pallas] Increase #rows when casting to x2. There is a bug in XLA on v5p. PiperOrigin-RevId: 736987667	2025-03-14 14:32:33 -07:00
Justin Fu	dbd8d92075	[Pallas] Add legacy PRNG key support to Pallas PRNG PiperOrigin-RevId: 736949584	2025-03-14 12:30:04 -07:00
jax authors	cbece0b00b	Add explicit support for float8_e4m3b11fnuz in pl.dot PiperOrigin-RevId: 736798315	2025-03-14 02:51:55 -07:00
Tzu-Wei Sung	e235fb9760	[Mosaic] Allow part of x2 int casts. This should at least allow int2 -> int4 for native tiling vregs. Skip many tests due to XLA compatibility. PiperOrigin-RevId: 736710186	2025-03-13 18:57:36 -07:00
jax authors	47bf22e37d	[pallas][Mosaic][Easy] Add batch dot dim test, remove check PiperOrigin-RevId: 736623531	2025-03-13 13:38:44 -07:00
Sergei Lebedev	e33f3fc48b	[pallas:mosaic_gpu] Added support for reductions to the WG lowering Note that * we have no easy way of testing multi-reductions at the moment; * `reduce_max` assumes WGMMA_ROW layout which is not currently supported by the dialect lowering AFAICT. PiperOrigin-RevId: 736138554	2025-03-12 08:18:31 -07:00
Jevin Jiang	29bfd00f9c	[Pallas TPU] Fix preferred_element_type propagation in dot_general with const PiperOrigin-RevId: 735903687	2025-03-11 15:06:07 -07:00
Jevin Jiang	eff612a3b6	Fix the assumption that pages_per_seq is already a multiple of num_kv_pages_per_blk. PiperOrigin-RevId: 735851301	2025-03-11 12:36:33 -07:00
Benjamin Chetioui	7fd32ecc04	[Pallas/Mosaic GPU] Explicitly disable `ops_test` on Mosaic GPU pre-Hopper. PiperOrigin-RevId: 735744473	2025-03-11 07:11:09 -07:00
jax authors	aceae84fab	[Pallas] Enable skipping of floating-point operations when interpreting Pallas TPU kernels on CPU. PiperOrigin-RevId: 735527650	2025-03-10 15:14:00 -07:00
Jacob Burnim	802cb33bf8	[Pallas] Increase tolerance in PallasOutOfBoundsInterpretTest. PiperOrigin-RevId: 735519526	2025-03-10 14:49:34 -07:00
Jacob Burnim	73d20cd62a	[Pallas] Small fix to TPU interpret mode (input_output_aliases + scalar args). PiperOrigin-RevId: 735455671	2025-03-10 11:40:10 -07:00
Sergei Lebedev	91340ea0a7	[pallas:mosaic_gpu] Added support for math functions to the WG lowering PiperOrigin-RevId: 735333893	2025-03-10 05:08:19 -07:00
Benjamin Chetioui	75d8702023	[Pallas/Mosaic GPU] Add lowerings/layout inference for all the necessary conversion ops when using Warpgroup semantics. Enable some of the pre-existing Pallas `ops_test`s for testing. PiperOrigin-RevId: 735293084	2025-03-10 02:14:39 -07:00
Jevin Jiang	0f0636afab	[Mosaic TPU][Pallas] Add pl.reciprocal PiperOrigin-RevId: 734749577	2025-03-07 18:29:30 -08:00
Jevin Jiang	041f575747	Support MHA in ragged paged attention for packed type PiperOrigin-RevId: 734695213	2025-03-07 14:47:04 -08:00
Christos Perivolaropoulos	eeccc67c0b	[mgpu] Debug print arrays. PiperOrigin-RevId: 734576543	2025-03-07 08:58:25 -08:00
Adam Paszke	402389290c	[Mosaic TPU] Enable all conversions involving fp8 types on TPUv5+ PiperOrigin-RevId: 734558364	2025-03-07 07:59:31 -08:00
Jevin Jiang	ff4310f640	[Mosaic TPU] Support fp8 upcast to f32 PiperOrigin-RevId: 734345644	2025-03-06 17:19:15 -08:00
Jevin Jiang	4b49c03523	Open source TPU-friendly ragged paged attention kernel. Key features: * *Support mixed prefill and decode* to increase throughput for inference. (eg., *5x* speedup compared to padded Muti-Queries Paged Attention implementation for llama-3-8b.) * *No explicit `swapaxes`* for `seq_len` and `num_head` in pre/post kernel. The kernel takes `num_head` in 2nd minor as it naturally was. We fold swapaxes to strided load/store in the kernel and apply transpose on the fly. * *No GMM (Grouped Matmul) Metadata required!* We calculate the metadata on the fly in the kernel. This can speed up *10%! *Increase MXU utilization 8x in GQA* by grouping shared q heads for MXU in decode. * *Minimize recompilation:* The only factors can cause recompilation are model specs, `max_num_batched_tokens` and `max_num_seqs` in the setting of mixed engine. PiperOrigin-RevId: 734269519	2025-03-06 13:36:45 -08:00
Sergei Lebedev	2a34019388	[pallas:mosaic_gpu] Added WG lowering rule for `lax.bitcast_convert_type_p` PiperOrigin-RevId: 734081448	2025-03-06 04:09:55 -08:00
Chris Jones	d6b97c2026	[pallas] Add support for `pl.dot` with `int8` inputs. PiperOrigin-RevId: 734081057	2025-03-06 04:08:04 -08:00
Benjamin Chetioui	fe577b5dc4	[Pallas/Mosaic GPU] Enable `ops_test` for Mosaic GPU. For now, most of the tests are skipped. PiperOrigin-RevId: 734026728	2025-03-06 00:45:05 -08:00
Jacob Burnim	016b351f00	[Pallas] Adds a simple dynamic race detector for TPU interpret mode. PiperOrigin-RevId: 733885890	2025-03-05 15:15:21 -08:00
Gleb Pobudzey	43b6be0e81	[Mosaic GPU] Add lowering for `log`, and a fast path using log2. PiperOrigin-RevId: 733411276	2025-03-04 11:50:50 -08:00
Sergei Lebedev	155839bb4d	[pallas:triton] Emit a better error message for matmul with non-2D operands Triton seems to support both 2D and 3D operands now, the latter case being a batched matmul. We need more changes in the lowering to support 3D, so I will leave it out of scope here. Fixes #26013. PiperOrigin-RevId: 733293299	2025-03-04 05:46:29 -08:00
Sharad Vikram	00d9f4529d	[Pallas/Fuser] Add support for custom_call_jvp/pjit to push_block_spec PiperOrigin-RevId: 733122108	2025-03-03 17:43:13 -08:00
Sharad Vikram	0b6c355083	[Pallas] Add experimental (private for now) API for manual fusion into Pallas kernels PiperOrigin-RevId: 733112191	2025-03-03 17:05:51 -08:00
jax authors	d8953e5311	Remove spurious zero_to_zero conversion used exclusively for backcompat types as a way of supporting (best effort) unsupported types on hardware to make it easier to debug. In general, this is a good feature, but it assumed that the packing type utilized here was exclusively for backcompat, and so always applied the adjustment. PiperOrigin-RevId: 731954456	2025-02-27 19:21:37 -08:00
jax authors	3450e2cee0	Disable certain tests on V4 and below. PiperOrigin-RevId: 731812726	2025-02-27 12:02:52 -08:00
Chris Jones	d6752e9267	[pallas:triton] Generate more efficient code for loading contiguous slices of `int4` values. The existing `int4` loading code is very generic. When reading contiguous data, it will read with offsets like `0, 0, 1, 1, ...`. Triton doesn't consider these to be contiguous in memory and emits much less efficient code than when reading contiguous blocks. PiperOrigin-RevId: 731635736	2025-02-27 01:57:47 -08:00
Adam Paszke	99a12ef9ea	[Mosaic GPU] Add support for warpgroup lowering of loops with vector carries PiperOrigin-RevId: 731260912	2025-02-26 04:29:36 -08:00
Jacob Burnim	4c7140fa03	[Pallas] Add option for async DMAs in the new TPU interpret mode When dma_execution_mode='on_wait', we wait to execute DMAs until we are interpreting a `dma_wait` instruction. In particular, while a device is waiting on a DMA semaphore, we will (partially) execute DMAs that signal that semaphore until the wait operation can succeed. PiperOrigin-RevId: 731103569	2025-02-25 18:19:20 -08:00
Gleb Pobudzey	a35494e020	Allow query and keys that aren’t multiples of 128	2025-02-25 19:13:24 +00:00
Adam Paszke	3d87a01bea	[Pallas:MGPU] Adjust warpgroup lowering to the recent emit_pipeline changes The Pallas-level pipelining generates a number of ops we haven't had to deal with before like conditionals, scans, etc. PiperOrigin-RevId: 730899808	2025-02-25 08:39:44 -08:00
George Necula	c4e0db6f8a	[better_errors] Port the Pallas debug info mechanisms to the new JAX DebugInfo. Now that we carry debug informatiion in Jaxpr we can remove the Pallas-specific tracking of the `func_src_info`, e.g., `NameAndSrcInfo`.	2025-02-25 14:43:17 +01:00
Sergei Lebedev	c13a2f95d5	[pallas:mosaic_gpu] Use `emit_pipeline` for pipelining in the lowering This shaves off a lot of complexity from our lowering code, while retaining all of the functionality, except the arrive_tx optimization: `emit_pipeline` arrives once per buffer, whereas the pipelining in the lowering used to arrive once for all buffers. PiperOrigin-RevId: 730824239	2025-02-25 04:14:10 -08:00
Adam Paszke	676acebafa	[Pallas:MGPU] Enable lowering for .astype and scalar broadcasts PiperOrigin-RevId: 730805326	2025-02-25 03:01:11 -08:00
Sergei Lebedev	7eadc64b5a	[pallas:mosaic_gpu] Added WG lowering rules for TMA primitives and run_scoped_p PiperOrigin-RevId: 730780335	2025-02-25 01:32:43 -08:00
Gleb Pobudzey	8f1dd02e5f	Add a param to not normalize the attention weights	2025-02-25 00:55:58 +00:00
Sergei Lebedev	7438976e79	[pallas:mosaic_gpu] Added support for binary/comparison ops with WG semantics PiperOrigin-RevId: 729266484	2025-02-20 15:06:27 -08:00
Jacob Burnim	ac74857d27	[Pallas] Support dynamic grids in the new TPU interpret mode PiperOrigin-RevId: 728786896	2025-02-19 13:09:23 -08:00
Jevin Jiang	bb68124c33	[Mosaic TPU] Support mask concat PiperOrigin-RevId: 728349788	2025-02-18 14:03:46 -08:00
Sergei Lebedev	d4559ba404	[pallas] Skip `OpsTest.test_concat_constant` if TPU is not available PiperOrigin-RevId: 728269127	2025-02-18 10:38:35 -08:00
jax authors	eaceac3bf9	[Pallas] Reductions with replicated axes. PiperOrigin-RevId: 727292293	2025-02-15 07:41:16 -08:00
Marcello Maggioni	9a8c9a56cf	[JAX] Allow pallas to accept scalar shape semaphores. PiperOrigin-RevId: 727198066	2025-02-14 23:20:47 -08:00
Adam Paszke	b287c3924a	Ignore ImportError for Triton on Windows We don't support Windows GPU builds right now and skip all the tests, but at the moment they can't even skip because of the import failure. PiperOrigin-RevId: 726917651	2025-02-14 07:17:49 -08:00
Christos Perivolaropoulos	49ad24152c	[pallas:mgpu] Change FA3 kernel bc lax.div doesn't like mixed types anymore. PiperOrigin-RevId: 726883573	2025-02-14 05:10:49 -08:00
Sergei Lebedev	3162cc4d0d	[pallas:triton] Added basic support for `lax.concatenate` The corresponding Triton op is restricted to `jnp.stack([x, y], axis=-1)`, so the lowering only supports that case for now. See #25321. PiperOrigin-RevId: 726881284	2025-02-14 05:02:53 -08:00

1 2 3 4 5 ...

670 Commits