rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-24 01:56:05 +00:00

Author	SHA1	Message	Date
Dan Foreman-Mackey	8b1b039e0d	Improve error messages when input argument resolution fails in custom_* APIs.	2025-03-04 10:31:35 -05:00
Sergei Lebedev	155839bb4d	[pallas:triton] Emit a better error message for matmul with non-2D operands Triton seems to support both 2D and 3D operands now, the latter case being a batched matmul. We need more changes in the lowering to support 3D, so I will leave it out of scope here. Fixes #26013. PiperOrigin-RevId: 733293299	2025-03-04 05:46:29 -08:00
Dan Foreman-Mackey	6c5ef1a404	Update jnp.unique to support upstream interface changes.	2025-03-04 05:24:52 -05:00
Ayaka	ea53c7616b	Fix thread safety of JAX error checking Fix thread safety of JAX error checking by making the global states thread local PiperOrigin-RevId: 733164878	2025-03-03 20:56:01 -08:00
Sharad Vikram	00d9f4529d	[Pallas/Fuser] Add support for custom_call_jvp/pjit to push_block_spec PiperOrigin-RevId: 733122108	2025-03-03 17:43:13 -08:00
Sharad Vikram	d32e282ff9	Add fuser to jax.experimental.pallas Note that fuser is considered experimental within Pallas and APIs are subject to change PiperOrigin-RevId: 733117882	2025-03-03 17:26:44 -08:00
Sharad Vikram	0b6c355083	[Pallas] Add experimental (private for now) API for manual fusion into Pallas kernels PiperOrigin-RevId: 733112191	2025-03-03 17:05:51 -08:00
jax authors	2c7043f63d	Merge pull request #26865 from jakevdp:fix-indexing-error PiperOrigin-RevId: 733085471	2025-03-03 15:38:20 -08:00
jax authors	f9f47217df	Merge pull request #26862 from jakevdp:logsumexp-docs PiperOrigin-RevId: 733080943	2025-03-03 15:24:10 -08:00
jax authors	4944dcb977	Merge pull request #26897 from jakevdp:cond-doc PiperOrigin-RevId: 733077065	2025-03-03 15:13:23 -08:00
jax authors	07d1cd0290	Merge pull request #26876 from carlosgmartin:fix_matrix_norm_empty_matrix PiperOrigin-RevId: 733077011	2025-03-03 15:11:31 -08:00
Jake VanderPlas	84ca80d215	doc: in lax.cond, note that both branches will be traced	2025-03-03 13:05:24 -08:00
Peter Hawkins	7f05b74bca	Fix wrong results in multidimensional pad. When there are multiple dimensions, NumPy's semantics are as if the padding is applied to each dimension in order. We lacked test coverage for this case because constant values ((0, 2),) and (0, 2) were handled by different code paths. Fixes https://github.com/jax-ml/jax/issues/26888	2025-03-03 15:25:08 -05:00
carlosgmartin	897e1a1310	Fix linalg.norm to return zero for proper norms of empty matrices.	2025-03-03 15:02:34 -05:00
Adam Paszke	e9f95cc3a7	[Mosaic GPU] Make the small WGMMA tile independent of transpose flags Now the small tiling is always `(8, swizzle // bytewidth(dtype))`, no matter whether the input is transposed or not. This should simply the follow-up refactoring of the code and make it easier to enable small tiling for LHS too. PiperOrigin-RevId: 732933005	2025-03-03 08:30:57 -08:00
Bart Chrzaszcz	ed4a7bbab1	#sdy Add JAX backwards compatibility test. This tests saving a module with one set of axis names, but loading it with another set of axis names. This does also test the custom calls: - `@Sharding` - `@xla.sdy.GlobalToLocalShape` - `@xla.sdy.LocalToGlobalShape` But note that there are a bunch of other custom calls that will be tested in the Shardy and XLA codebases. The way the testing utils is tested here doesn't allow me to set `out_shardings` for example. So JAX can rely on the existence of those tests as stability guarantees just like for StableHLO. PiperOrigin-RevId: 732893432	2025-03-03 06:01:34 -08:00
Bart Chrzaszcz	ac493655bf	#sdy support JAX export tests when Shardy is enabled. This CL only supports lowering a module with the exact same mesh, and loading it with either the exact same mesh or different meshes. Note that we will be introducing some restrictions under Shardy for JAX export: - You can only lower/save the module with meshes all of the same shape, but different axis names (this PR is right now only allowing the same axis names, but this will be relaxed in a follow-up) - When loading the module, just like with GSPMD, you can use a different mesh with a different mesh shape and axis names. However, like with the restriction in the previous point, all shardings must use the same axis shapes, but can use different axis names (again this will be relaxed in a follow-up) We may remove the restriction of having to use the exact same mesh shapes during export saving time and exact same mesh shaped during export loading time in the future. But for now we will keep this restriction while no one is using Shardy with JAX export. PiperOrigin-RevId: 732878916	2025-03-03 04:57:06 -08:00
Christos Perivolaropoulos	b9ebd9188f	[mgpu] Forach in tiled layout. PiperOrigin-RevId: 732872906	2025-03-03 04:31:59 -08:00
Adam Paszke	11e6cfbc6a	[Mosaic GPU][NFC] Move the calculation of group strides into _validate_mma This allows us to unify this logic between Hopper and Blackwell. PiperOrigin-RevId: 732862875	2025-03-03 03:51:20 -08:00
jax authors	bbadf99054	Merge pull request #26697 from gnecula:pp_aliased_var_names PiperOrigin-RevId: 732860010	2025-03-03 03:36:50 -08:00
Adam Paszke	3038348f23	[Mosaic GPU][NFC] Clean up the computation of group strides PiperOrigin-RevId: 732849235	2025-03-03 02:50:48 -08:00
George Necula	a6c47d6f36	Use the same name for aliased Vars when pretty-printing Jaxprs. Add a mechanism for using the same Var names for Vars that are aliased. In this PR, we use this for `pjit`, such that the following `print(jax.make_jaxpr(lambda a: jax.jit(lambda a: a + 1)(a))(0.))` prints: ``` { lambda ; a:f32[]. let b:f32[] = pjit[ name=<lambda> jaxpr={ lambda ; a:f32[]. let b:f32[] = add a 1.0 in (b,) } ] a in (b,) } ``` instead of the previous: ``` { lambda ; a:f32[]. let b:f32[] = pjit[ name=<lambda> jaxpr={ lambda ; c:f32[]. let d:f32[] = add c 1.0 in (d,) } ] a in (b,) } ``` The same mechanism could be used for other higher-order primitives, e.g., cond, and others. Also add some typing declarations and rename APIs to use "shared jaxpr" in lieu of "top-level jaxpr" for those Jaxprs that are used multiple times and are printed first. I presume that the term "top-level jaxpr" was picked because these are printed first at top-level. But this is confusing, because they are really subjaxprs. In fact, there was already a function `core.pp_toplevel_jaxpr` for printing the top-level Jaxpr, and there was also `core.pp_top_level_jaxpr` (which now is named `core.pp_shared_jaxpr`.	2025-03-03 11:38:51 +01:00
Parker Schuh	b8b690e594	Add use_high_dynamic_range_gumbel flag which allows sampling gumbel such that it more closely matches the CDF for low probably events (less than 2**-nmant). Because -log(-log(x)) is more sensitive close to 1 than 0, we must use -log(-logp1(-x)) instead to make better use of the extra range around 0. PiperOrigin-RevId: 732757388	2025-03-02 19:42:40 -08:00
Dimitar (Mitko) Asenov	3b305c6617	[Mosaic GPU] Infer layouts (transforms) on memrefs that directly feed into the dialect wgmma op. This change detects a situation where a gmem_memref is read via `async_load` and directly used in a wgmma. In such cases, we insert a cast before the load to add tile, transpose, and swizzle transformations. PiperOrigin-RevId: 732618760	2025-03-02 03:17:13 -08:00
Dimitar (Mitko) Asenov	c60ef5a2a1	[Mosaic GPU] Wire up the `slice_lengths` and `indices` operands in lowering of the MLIR dialect. This enables slicing via TMA and is needed for pipelining. PiperOrigin-RevId: 732613803	2025-03-02 02:43:47 -08:00
Yash Katariya	53494ade2d	`PRNGKeyArray.aval` should have the correct logical sharding. This required refactoring code so that we don't hit recursion errors. PiperOrigin-RevId: 732536521	2025-03-01 18:18:19 -08:00
jax authors	2a1eeb0ce8	Chnages for kernel export PiperOrigin-RevId: 732383028	2025-03-01 00:32:39 -08:00
Anton Osokin	1f3176636d	Reverts 10f6edeb496a2eec2a09c2c5cecbe4f8f02452ab PiperOrigin-RevId: 732315349	2025-02-28 18:04:27 -08:00
Jake VanderPlas	b2c45b8eb9	Improved errors when indexing with floats	2025-02-28 15:04:07 -08:00
Jake VanderPlas	c56e794a66	doc: fix description of logsumexp axis	2025-02-28 12:53:33 -08:00
Yash Katariya	da1cc0a50e	[sharding_in_types] `out_sharding` argument on einsum should only apply to the last einsum and not intermediate einsums. For example: Consider this einsum: `jnp.einsum('bthD, bthi, bthj->ijD', dy, i, j, out_sharding=P('data', None, None))` This will decompose into 2 einsums where the intermediate einsum output will be of rank `5`: * `'bthj,bthD->bthjD'` * `'bthjD,bthi->ijD'` The out_sharding specified (`P('data', None, None)`) is not compatible with the intermediate einsum: `'bthj,bthD->bthjD'` since the `length of spec (3) != out_aval.ndim (5)`. This change makes it so that out_sharding is only applied to the contraction that leads to the final output. If there are conflicts in intermediate einsums, then the user has to reshard the input or split into multiple einsums (and maybe provide out_sharding) so that conflicts don't exist. Note: We won't drop into auto mode for intermediate einsums. The user will have to split the einsum if any conflict is detected. PiperOrigin-RevId: 732205849	2025-02-28 11:39:14 -08:00
Dan Foreman-Mackey	bb9aed5eec	Reimplement custom_vjp.optimize_remat using custom_dce.	2025-02-28 10:00:28 -05:00
Adam Paszke	bb96226dd8	[Mosaic GPU] Add support for small RHS tile sizes in WGMMA This is useful for more fine-grained autotuning and can help avoid wave quantization effects. PiperOrigin-RevId: 732105219	2025-02-28 05:41:30 -08:00
Benjamin Chetioui	1bc36e623b	[Mosaic GPU][NFC] Delete workaround for dialect bindings before jaxlib 0.5.1. PiperOrigin-RevId: 732102282	2025-02-28 05:25:53 -08:00
Benjamin Chetioui	7c46480eab	[Mosaic GPU] Fix `as_dialect_barrier_memref` to take into account `BarrierRef`'s offset. PiperOrigin-RevId: 732098299	2025-02-28 05:06:57 -08:00
Benjamin Chetioui	a9ab614123	[Pallas/Mosaic GPU] Add an abstraction to obtain a slice of dynamic shared memory when using waprgroup semantics. Explicitly make the assumption that `runtime_smem` starts at `0` in the Pallas module context---which should be enforced by Mosaic GPU. This is in preparation of changes implementing transform inference. PiperOrigin-RevId: 732091266	2025-02-28 04:38:25 -08:00
Benjamin Chetioui	abfe2d080e	[Mosaic GPU][NFC] Move some functions to a new file called `inference_utils.py`. The intent is to move utils that are useful for both layout inference and transform inference to a shared location. PiperOrigin-RevId: 732067659	2025-02-28 03:02:59 -08:00
Adam Paszke	092ea35301	[Mosaic GPU][NFC] Start refactoring the MMA parameter inference The CUDA 12.8 release significantly improved the MMA docs, letting us improve upon the previously used "magic number" scheme. Sadly, the docs are still incorrect, but at least I can begin to make some sense of those parameters. PiperOrigin-RevId: 732033585	2025-02-28 00:50:20 -08:00
Yash Katariya	dda62f576f	Make sure default layout is None for input and output layout in all codepaths PiperOrigin-RevId: 731865511	2025-02-27 14:26:25 -08:00
jax authors	c7ca35fe32	Merge pull request #26345 from wenscarl:scaled_matmul PiperOrigin-RevId: 731865430	2025-02-27 14:24:48 -08:00
jax authors	6a7736754f	Reverts 0f0d5e90ef1c3d60f35020141710ea350d17816b PiperOrigin-RevId: 731844119	2025-02-27 13:27:32 -08:00
Sharad Vikram	6f57410e12	[Pallas TPU] Use grid_env for pipeline body so we can query num_programs/program_id inside the block spec PiperOrigin-RevId: 731831543	2025-02-27 12:53:02 -08:00
Yash Katariya	07f192cd48	Merge _check_mesh_resource_axis and _check_axis_type_consistency into 1 function. PiperOrigin-RevId: 731830347	2025-02-27 12:51:25 -08:00
Yash Katariya	c265568530	Remove parsed_pspec from NamedSharding constructor PiperOrigin-RevId: 731820173	2025-02-27 12:24:17 -08:00
Peter Hawkins	1e5d9a9158	Add an allow_negative_indices option to lax.dynamic_slice and lax.dynamic_update_slice. The goal of this change is to avoid generating code to wrap negative indices back into range in cases where we know it doesn't matter. Change scan to pass allow_negative_indices=False to avoid emitting index wrapping code for each scan argument. PiperOrigin-RevId: 731812827	2025-02-27 12:04:28 -08:00
Yash Katariya	c94ec0eb0d	Use batched_device_put for token shard_arg handler PiperOrigin-RevId: 731800613	2025-02-27 11:30:22 -08:00
jax authors	da39b6f3d4	Comment change PiperOrigin-RevId: 731792151	2025-02-27 11:07:59 -08:00
Yash Katariya	d69da3b012	More cleanups around ParsedPartitionSpec. In a follow up CL, I can remove it from NamedSharding constructor. Deleting ParsedPartitionSpec is remaining but that's after 0.5.2 release. PiperOrigin-RevId: 731785005	2025-02-27 10:51:04 -08:00
Yash Katariya	034a827a4d	Remove `_parsed_pspec` from everywhere in JAX except for NamedSharding constructor. I'll do that in the next CL since that has a dependency on C++ so needs guards. PiperOrigin-RevId: 731772222	2025-02-27 10:17:06 -08:00
Yash Katariya	177e1f6ed9	Canonicalize PartitionSpec so that we can delete ParsedPartitionSpec. We need to do this after sharding-in-types to speed up NamedSharding construction and remove a lot of tech debt and unnecessary complexity. * `_partitions` is now canonicalized and only contains `tuples`, `singular strings`, `None` or `UNCONSTRAINED`. No more empty tuples (`P((), 'x')`) and singleton tuples. * Cache the creating of sharding on ShapedArray since it's expensive to do it a lot of times * Change the `__hash__` and `__eq__` of `NamedSharding` to depend on `self.spec` instead of `self._parsed_pspec`. PiperOrigin-RevId: 731745062	2025-02-27 08:59:25 -08:00

... 2 3 4 5 6 ...

16383 Commits