rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-14 10:56:06 +00:00

Author	SHA1	Message	Date
jax authors	e9ce8fb92d	Merge pull request #27227 from jburnim:jburnim_pallas_interpret_mode4 PiperOrigin-RevId: 738235363	2025-03-18 20:22:27 -07:00
Sharad Vikram	e949effcda	[Pallas/Fuser] DCE fusion jaxprs before pulling (to avoid unnecessary computations being staged out in block functions) PiperOrigin-RevId: 738218113	2025-03-18 19:00:41 -07:00
Sharad Vikram	4d715753c4	Make sure to DCE read effects PiperOrigin-RevId: 738215055	2025-03-18 18:42:14 -07:00
Yash Katariya	663ef7ae01	Check the type of mesh in `use_abstract_mesh` and `use_concrete_mesh` PiperOrigin-RevId: 738190879	2025-03-18 16:57:40 -07:00
Peter Hawkins	3f91b4b43a	Move jaxlib/{cuda,rocm}_plugin_extension into jaxlib/{cuda/rocm}/ Move the common jaxlib/gpu_plugin_extension into jaxlib/gpu/ Cleanup only, no functional changes intended. PiperOrigin-RevId: 738183402	2025-03-18 16:29:37 -07:00
jax authors	01a110c4c9	Better mosaic lowering for dynamic shapes, extend an interpreter into shape_poly dimexpr and lower them alongside the graph if we are in a dynamic export regime. PiperOrigin-RevId: 738171437	2025-03-18 15:51:15 -07:00
Parker Schuh	0fb59747f0	Support tuples in custom_partitioning. PiperOrigin-RevId: 738154413	2025-03-18 14:57:08 -07:00
Gleb Pobudzey	54691b125a	[Mosaic GPU] Support reads/writes from SMEM to WGMMARowFragLayout arrays. PiperOrigin-RevId: 738121106	2025-03-18 13:23:07 -07:00
Matthew Johnson	942ff38e36	fix to ragged_all_to_all transpose PiperOrigin-RevId: 738110447	2025-03-18 12:51:21 -07:00
Jacob Burnim	47e8effdce	Adds option to initialize buffers to NaNs or zeros in TPU interpret mode.	2025-03-18 12:24:45 -07:00
Benjamin Chetioui	875099b25d	[Mosaic GPU] Enable the new transform inference pass in the warpgroup lowering. A couple of dummy transform inference rules needed to be added in order to contend with parts of the lowering that do not use the dialect yet, along with a transform inference rule for `memref.view`. PiperOrigin-RevId: 738089782	2025-03-18 11:51:43 -07:00
Yash Katariya	a5c0f200e7	`set_mesh` should return the prev_mesh instead of nothing. Users can choose to use the return value or ignore it. PiperOrigin-RevId: 738039559	2025-03-18 09:43:25 -07:00
jax authors	7c5871f464	[Pallas TPU] Hoist prologue and epilogue outside of pipeline loop PiperOrigin-RevId: 738038138	2025-03-18 09:40:43 -07:00
jax authors	30941480a1	Merge pull request #27198 from jakevdp:lax-docs PiperOrigin-RevId: 738038116	2025-03-18 09:38:58 -07:00
jax authors	13541e9f12	Make blocked_fold_in consistent when the block sizes induce padding Add coverage for padded shapes to unit tests. PiperOrigin-RevId: 738029476	2025-03-18 09:12:11 -07:00
Jake VanderPlas	8b46e53a4f	jax.lax: improve docs for several APIs	2025-03-18 08:55:38 -07:00
Benjamin Chetioui	1e36cbe597	[Mosaic GPU] Raise a `NotImplementedError` if `swizzle=16`. Unswizzled MMAs don't lower correctly, and are not currently intended to be supported. PiperOrigin-RevId: 737981373	2025-03-18 06:29:13 -07:00
Adam Paszke	8da93249d2	[Mosaic GPU] Fuse slicing into s4 -> bf16 upcasts This allows us to significantly simplify the generated PTX/SASS, which is currently cluttered with LLVM trying to align slices to start at bit 0 and failing to CSE the right shifts. PiperOrigin-RevId: 737967890	2025-03-18 05:38:49 -07:00
Benjamin Chetioui	ba2f7c9ad9	[Mosaic GPU] Add transform inference rule for `mgpu.slice_smem`. PiperOrigin-RevId: 737957778	2025-03-18 04:53:54 -07:00
Adam Paszke	d4bd2570ae	[Mosaic GPU] Add a specialized layout for loading 4-bit inputs in WGMMA friendly layouts PiperOrigin-RevId: 737956598	2025-03-18 04:47:51 -07:00
Adam Paszke	34cd5b0d74	[Mosaic GPU] Remove sub-byte conversion restriction XLA:GPU recently changed its endianness to little endian to better match LLVM and the rest of the CUDA ecosystem, so we can lift the earlier restrictions. PiperOrigin-RevId: 737934373	2025-03-18 03:13:21 -07:00
Yash Katariya	549973dec6	Allow pspec to be passed to device_put if there is a mesh in the surrounding context PiperOrigin-RevId: 737812111	2025-03-17 17:47:56 -07:00
Emily Fertig	8c35191725	Enable `jax.device_put` to a sharding with no local devices. PiperOrigin-RevId: 737797815	2025-03-17 16:49:46 -07:00
Sergei Lebedev	051687dc4c	[pallas] `pallas_call_p` is now parameterized by a mesh The mesh is necessary to add support for clusters to the Mosaic GPU backend. PiperOrigin-RevId: 737792129	2025-03-17 16:30:40 -07:00
jax authors	b4966130a3	Compute tile index using tile-based coordinates This reduces the chances of overflowing a 32-bit integer when computing tile indices. Add unit test to reproduce the overflow with the previous implementation of `blocked_fold_in`. PiperOrigin-RevId: 737778853	2025-03-17 15:46:27 -07:00
Peter Hawkins	20658fabb3	Replace cached function get_replicated_hlo_sharding() with a constant. Small cleanup, no functional changes intended. PiperOrigin-RevId: 737727727	2025-03-17 13:17:33 -07:00
jax authors	ebcae0d30a	Merge pull request #26980 from carlosgmartin:categorical_replace PiperOrigin-RevId: 737720590	2025-03-17 12:58:01 -07:00
Peter Hawkins	be5d13af77	Remove code that preserved _original_py_fns on C++ classes. This no longer appears to be used. PiperOrigin-RevId: 737715578	2025-03-17 12:43:04 -07:00
Benjamin Chetioui	9a686e0bf3	[Mosaic GPU] Add initial transform inference rules for `vector.{load,store}`. PiperOrigin-RevId: 737703568	2025-03-17 12:08:07 -07:00
carlosgmartin	3f59fa6888	Add replace option to random.categorical to enable sampling without replacement.	2025-03-17 13:41:46 -04:00
jax authors	de9ad6bad9	Merge pull request #27157 from mar-muel:improve-random-choice-performance PiperOrigin-RevId: 737665351	2025-03-17 10:30:15 -07:00
Adam Paszke	3649da56fc	[Mosaic GPU] Make the s4 -> bf16 upcast more flexible when it comes to vector length We can now perform the conversion in groups of 2, 4 or even 8 elements at a time. PiperOrigin-RevId: 737626600	2025-03-17 08:37:17 -07:00
Sergei Lebedev	0ff234049b	Removed trivial docstrings from JAX tests These docstrings do not make the tests any more clear and typically just duplicate the test module name. PiperOrigin-RevId: 737611977	2025-03-17 07:49:37 -07:00
Sergei Lebedev	a7e5eaee56	[pallas:mosaic_gpu] `jnp.reduce_sum` now works for >1D arrays PiperOrigin-RevId: 737578598	2025-03-17 05:32:07 -07:00
Adam Paszke	89b21de62a	[Mosaic GPU] Add support for changing the layout before the upcast This lets us save on 2 ALU instructions (3x select becomes 1x prmt). PiperOrigin-RevId: 737550598	2025-03-17 03:26:48 -07:00
Adam Paszke	2bdd9c8797	[Mosaic GPU] Add support for fast WGMMA layout changes after 8- to 16-bit upcast PiperOrigin-RevId: 737542885	2025-03-17 02:52:16 -07:00
jax authors	761b35c59e	Merge pull request #27176 from jakevdp:lax-docs PiperOrigin-RevId: 737338493	2025-03-16 05:39:55 -07:00
Joan Puigcerver	466ef6a132	Change the way that batching.spec_types is updated. There's no reason why not two custom vmappable types cannot share the same spec_type. However, spec_types was a set, which can cause bugs / exceptions. Suppose that I register two vmappable data_types sharing the same spec_type, and then unregister one of the two. Then, the spec_type is no longer in the set to support the second data_type. Also, an exception will be raised if I try to unregister the two vmappable types (the second call to spec_types.remove). When unregistering a data type, instead of removing its spec_type from the set, we regenerate the set from the remaining vmappable types. PiperOrigin-RevId: 737280270	2025-03-15 22:58:44 -07:00
Jake VanderPlas	de8b0564ce	Better docs for jax.lax add/sub/mul/div	2025-03-15 11:49:51 -07:00
Ayaka	9b0ace4a11	Support error checking in explicit mode PiperOrigin-RevId: 737051146	2025-03-14 18:58:26 -07:00
jax authors	7db59cdcca	Merge pull request #27174 from mattjj:opt-barrier-ad-rules PiperOrigin-RevId: 737040381	2025-03-14 17:59:07 -07:00
Peter Hawkins	14cb7453f0	Add a C++ implementation of a toplogical sort. This is an exact port of the current Python implementation to C++ for speed. I am being careful not to change the topological order we return in any way in this change, although we may do so in a future change. PiperOrigin-RevId: 737014989	2025-03-14 16:04:25 -07:00
Matthew Johnson	dadc68b6c1	add experimental lax.optimization_barrier autodiff rules	2025-03-14 22:40:55 +00:00
jax authors	b00a3a1986	Merge pull request #27015 from mattjj:direct-linearize-fixes-4 PiperOrigin-RevId: 737003323	2025-03-14 15:24:11 -07:00
Sergei Lebedev	64230d1c93	[pallas:mosaic_gpu] WG lowering now supports `while_p` PiperOrigin-RevId: 736996154	2025-03-14 14:59:29 -07:00
Matthew Johnson	174dcc771a	[direct-linearize] shmap fixes	2025-03-14 21:38:50 +00:00
Daniel Suo	39e8ee93b0	Add `experimental/serialize_executable.py` to `BUILD`. PiperOrigin-RevId: 736975882	2025-03-14 13:54:39 -07:00
Yash Katariya	aa9480a441	Expose `get_abstract_mesh` via the `jax.sharding` namespace PiperOrigin-RevId: 736972976	2025-03-14 13:45:32 -07:00
Justin Fu	dbd8d92075	[Pallas] Add legacy PRNG key support to Pallas PRNG PiperOrigin-RevId: 736949584	2025-03-14 12:30:04 -07:00
Zac Mustin	0c8e601f90	Support convolution in roofline. So far we support only `unfused_hmb_bytes` and don't account for `{feature, batch}_group_count`s due to complexity. PiperOrigin-RevId: 736948528	2025-03-14 12:26:20 -07:00

1 2 3 4 5 ...

16415 Commits