rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-25 00:56:04 +00:00

Author	SHA1	Message	Date
Yash Katariya	549973dec6	Allow pspec to be passed to device_put if there is a mesh in the surrounding context PiperOrigin-RevId: 737812111	2025-03-17 17:47:56 -07:00
Yash Katariya	88d4bc3d45	Rename AxisTypes enum to AxisType PiperOrigin-RevId: 736935746	2025-03-14 11:48:21 -07:00
Yash Katariya	e615e2acb3	Raise a better error with more info when we see duplicate axis in a PartitionSpec resulting from a sharding rule. Previously it was: `ValueError: A single NamedSharding spec specification can map every mesh axis to at most one positional dimension, but PartitionSpec('x', 'x') has duplicate entries for x` Now it is: `TypeError: dot_general operation with inputs: i64[8@x,2], i64[2,8@x] produces an illegally sharded result: i64[8@x,8@x]` PiperOrigin-RevId: 736657644	2025-03-13 15:24:10 -07:00
Yash Katariya	14b9f48535	Allow late binding `out_shardings` and `in_shardings` in `auto_axes` and `explicit_axes` API PiperOrigin-RevId: 736535562	2025-03-13 09:37:24 -07:00
Yash Katariya	a4ca0dbc6c	Make the signature of AbstractMesh to be `AbstractMesh(axis_size: tuple[int, ...], axis_name: tuple[str, ...], , axis_types)` instead of `AbstractMesh(shape_tuple: tuple[tuple[str, int], ...], , axis_types)` so that we are consistent across all Mesh APIs: `Mesh`, `AbstractMesh` and `make_mesh` PiperOrigin-RevId: 736371111	2025-03-12 21:32:31 -07:00
Yash Katariya	c6dcbb6759	[sharding_in_types] Rework the `axis_types` argument in Mesh and AbstractMesh APIs. The changes are: 1. axis_types now takes a `AxisTypes \| tuple[AxisTypes, ...] \| None`. It doesn't take a dictionary anymore 2. `jax.make_mesh` also takes the same `axis_types` tuple as in point 1. PiperOrigin-RevId: 736360041	2025-03-12 20:41:50 -07:00
Yash Katariya	47480b4493	Add a set_mesh API to `jax.sharding`. `set_mesh` sets the sharding and never unsets it i.e. this is just `__enter__` of a ctx manager without `__exit__` PiperOrigin-RevId: 736261724	2025-03-12 14:12:47 -07:00
Yash Katariya	8674495fd7	[sharding_in_types] Make `reshard` work with np.array. PiperOrigin-RevId: 736250504	2025-03-12 13:41:42 -07:00
Yash Katariya	3a26804c68	Rename `get_ty` to `typeof` which is an alias of `get_aval` PiperOrigin-RevId: 735946640	2025-03-11 17:34:44 -07:00
Yash Katariya	76dec38286	Under pjit the `with mesh:` context will use `use_mesh(mesh): jit` instead of tracking separately using `resource_env`. This would also make it easier to deprecate the `with mesh: pjit` path in the future from user code since the new path would be completely tested. This will also allow us to remove `resource_env` from JAX and the internal API access of `resource_env.physical_mesh` spread throughout codebases internally and externally. PiperOrigin-RevId: 735602187	2025-03-10 20:21:02 -07:00
Yash Katariya	9f37b5197f	[sharding_in_types] Fix a bug where `empty_array` in scan was created with the wrong spec when `unroll > 1`. PiperOrigin-RevId: 734591110	2025-03-07 09:47:32 -08:00
Yash Katariya	f8b98993b8	Add a divisibility check so that we make sure that sharding evenly divides the shape (until this restriction is lifted) to make sure we don't create bad shardings. Also improve dynamic_update_slice sharding error by printing `aval.str_short()` instead of full sharding because it's concise and gives more info than the current error (i.e. it adds shape too to the error message) Also make some formatting changes in scan lowering to make it easier to debug. PiperOrigin-RevId: 734542862	2025-03-07 07:01:34 -08:00
Yash Katariya	e9486920e8	Auto complete specs in a sharding if aval.ndim > len(sharding.spec) with `None`. So that for a 2D input, P('data') continues to work. PiperOrigin-RevId: 734325209	2025-03-06 16:10:14 -08:00
Yash Katariya	a67ab9fade	Just use `jit` as the string in error messages instead of `jit` and `pjit` based on resource_env. This is to start deprecating the need for `with mesh` and replace it with `use_mesh(mesh)`. PiperOrigin-RevId: 733959962	2025-03-05 20:09:30 -08:00
Yash Katariya	766315f791	Make sure concat + vmap of sharded input and replicated input works properly. In this case, the example boils down to: ``` inp1 = f32[16@x, 4] inp2 = f32[4] def f(x: f32[4], y: f32[4]) return jnp.concat([x, y], axis=-1) vmap(f, in_axes=(0, None))(inp1) ``` This example was breaking in concat batching rule because we didn't broadcast with the right sharding. PiperOrigin-RevId: 733536944	2025-03-04 18:35:13 -08:00
George Necula	a6c47d6f36	Use the same name for aliased Vars when pretty-printing Jaxprs. Add a mechanism for using the same Var names for Vars that are aliased. In this PR, we use this for `pjit`, such that the following `print(jax.make_jaxpr(lambda a: jax.jit(lambda a: a + 1)(a))(0.))` prints: ``` { lambda ; a:f32[]. let b:f32[] = pjit[ name=<lambda> jaxpr={ lambda ; a:f32[]. let b:f32[] = add a 1.0 in (b,) } ] a in (b,) } ``` instead of the previous: ``` { lambda ; a:f32[]. let b:f32[] = pjit[ name=<lambda> jaxpr={ lambda ; c:f32[]. let d:f32[] = add c 1.0 in (d,) } ] a in (b,) } ``` The same mechanism could be used for other higher-order primitives, e.g., cond, and others. Also add some typing declarations and rename APIs to use "shared jaxpr" in lieu of "top-level jaxpr" for those Jaxprs that are used multiple times and are printed first. I presume that the term "top-level jaxpr" was picked because these are printed first at top-level. But this is confusing, because they are really subjaxprs. In fact, there was already a function `core.pp_toplevel_jaxpr` for printing the top-level Jaxpr, and there was also `core.pp_top_level_jaxpr` (which now is named `core.pp_shared_jaxpr`.	2025-03-03 11:38:51 +01:00
Yash Katariya	53494ade2d	`PRNGKeyArray.aval` should have the correct logical sharding. This required refactoring code so that we don't hit recursion errors. PiperOrigin-RevId: 732536521	2025-03-01 18:18:19 -08:00
Yash Katariya	da1cc0a50e	[sharding_in_types] `out_sharding` argument on einsum should only apply to the last einsum and not intermediate einsums. For example: Consider this einsum: `jnp.einsum('bthD, bthi, bthj->ijD', dy, i, j, out_sharding=P('data', None, None))` This will decompose into 2 einsums where the intermediate einsum output will be of rank `5`: * `'bthj,bthD->bthjD'` * `'bthjD,bthi->ijD'` The out_sharding specified (`P('data', None, None)`) is not compatible with the intermediate einsum: `'bthj,bthD->bthjD'` since the `length of spec (3) != out_aval.ndim (5)`. This change makes it so that out_sharding is only applied to the contraction that leads to the final output. If there are conflicts in intermediate einsums, then the user has to reshard the input or split into multiple einsums (and maybe provide out_sharding) so that conflicts don't exist. Note: We won't drop into auto mode for intermediate einsums. The user will have to split the einsum if any conflict is detected. PiperOrigin-RevId: 732205849	2025-02-28 11:39:14 -08:00
Yash Katariya	d69da3b012	More cleanups around ParsedPartitionSpec. In a follow up CL, I can remove it from NamedSharding constructor. Deleting ParsedPartitionSpec is remaining but that's after 0.5.2 release. PiperOrigin-RevId: 731785005	2025-02-27 10:51:04 -08:00
Yash Katariya	034a827a4d	Remove `_parsed_pspec` from everywhere in JAX except for NamedSharding constructor. I'll do that in the next CL since that has a dependency on C++ so needs guards. PiperOrigin-RevId: 731772222	2025-02-27 10:17:06 -08:00
Yash Katariya	177e1f6ed9	Canonicalize PartitionSpec so that we can delete ParsedPartitionSpec. We need to do this after sharding-in-types to speed up NamedSharding construction and remove a lot of tech debt and unnecessary complexity. * `_partitions` is now canonicalized and only contains `tuples`, `singular strings`, `None` or `UNCONSTRAINED`. No more empty tuples (`P((), 'x')`) and singleton tuples. * Cache the creating of sharding on ShapedArray since it's expensive to do it a lot of times * Change the `__hash__` and `__eq__` of `NamedSharding` to depend on `self.spec` instead of `self._parsed_pspec`. PiperOrigin-RevId: 731745062	2025-02-27 08:59:25 -08:00
Peter Hawkins	6e73637888	Fix a test failure under multi-threading. Remove a tsan suppression for a CPython race that is fixed.	2025-02-27 06:07:05 -05:00
Peter Hawkins	66293d8897	Remove code present to support jaxlib < 0.5.1. The new minimum xla_extension_version is 317 and the new mlir_api_version is 58.	2025-02-26 07:40:40 -05:00
Bixia Zheng	30348e90e7	[jax:custom_partitioning] Propagate static arguments to sharding_rule callback. PiperOrigin-RevId: 730885306	2025-02-25 07:55:00 -08:00
Yash Katariya	b707f0bdbb	[sharding_in_types] Error out when using `auto_axes` or `explicit_axes` API when there is no context mesh. Those APIs don't support that right now anyways and they raise an ugly KeyError. Instead we raise a better error here. I have added a TODO to get the mesh from args so that computation follows data works but we can decide to do that in the future if a lot of users request that and don't want to use `use_mesh`. PiperOrigin-RevId: 730687231	2025-02-24 19:19:49 -08:00
Yash Katariya	6f8bab3c92	Add sharding mismatch to explain_tracing_cache_miss PiperOrigin-RevId: 730645598	2025-02-24 16:49:49 -08:00
jax authors	c17ea805f3	Merge pull request #26569 from gnecula:debug_info_arg_names PiperOrigin-RevId: 730432019	2025-02-24 06:48:41 -08:00
Yash Katariya	7d3c63eded	[sharding_in_types] Add more reshape sharding support * Allow merging and splitting only if major most dim is sharded since that involves no data movement. This only happens if `dimensions` is None i.e. if the input array is in row-major order. * Merging: If only the major most dim is sharded of the merge block then that sharding is propagated to the merge block output * Splitting: If the dimension being split is sharded, then the sharding is propagated to the major most dimension post split only if the spec divides the new shape exactly. PiperOrigin-RevId: 730291595	2025-02-23 21:39:23 -08:00
George Necula	1be801bac8	[better_errors] Cleanup use of DebugInfo.arg_names and result_paths Previously, we represented a missing arg name with `None`, and a missing result path with the empty string. We now adopt the same convention for arg names and use empty strings. This simplifies the typing, and prevents the string "None" from appearing in error messages. I changed how we encode the result paths. Previously for a function that returns a single array the path was the empty string (the same as for an unknown path). And for a function that returns a pair of arrays it was `([0], [1])`. Now we add the "result" prefix: `("result",)` for a function returning a single array and `(result[0], result[1])` for a function returning a pair of arrays. Finally, in debug_info_test, I removed the `check_tracer_arg_name` so that all spied tracers are printed with the argument name they depend on.	2025-02-23 08:27:56 +02:00
Yash Katariya	d695aa4c63	[sharding_in_types] Add sharding rules for the following primitives: * `bitcast_convert_element_type` * `cumsum` * `cumlogsumexp` * `cumprod` * `cummax` * `cummin` * `reduce_window` * `reduce_window_sum` * `reduce_window_max` * `reduce_window_min` * `select_and_gather_add` For `reduce_window_...` primitives only trivial windowing is supported along non-replicated dimensions. We can relax the other NotImplemented case in the future. PiperOrigin-RevId: 729910108	2025-02-22 10:45:58 -08:00
Yash Katariya	7c4fe2a7cc	[sharding_in_types] Allow `auto_axes` and `explicit_axes` to take numpy arrays, python scalars. PiperOrigin-RevId: 729729215	2025-02-21 18:49:02 -08:00
Yash Katariya	80f18ded23	[sharding_in_types] Make slice and ellipsis work with `.at[...].get(out_sharding=P(...))` PiperOrigin-RevId: 729723470	2025-02-21 18:25:11 -08:00
Yash Katariya	262aab74f0	canonicalize closed over values if atleast 1 mesh axis is `Manual` and all other mesh axes are `Manual` or `Auto`. This would make the canonicalization work properly with shmap partial-auto. If a mesh axis is Explicit, we don't canonicalize closed over values yet since that make require shape changes. The workaround is for users to pass those arrays as arguments instead of closing over them in a shard_map. PiperOrigin-RevId: 728956512	2025-02-19 22:18:56 -08:00
Yash Katariya	66d04f85e6	Error out if going from `Manual` -> `Auto/Explicit` AxisTypes in the `auto_axes` and `explicit_axes` API that do `mesh_cast` implicitly. Also, improve the error raised by canonicalize_sharding to include the api name and current source location. PiperOrigin-RevId: 728701237	2025-02-19 09:21:53 -08:00
Yash Katariya	b35083331c	Expose `get_ty` aka get_aval from jax namespace PiperOrigin-RevId: 728490205	2025-02-18 21:22:19 -08:00
Yash Katariya	1079dc4477	Let users pass in pspecs to with_sharding_constraint when `use_mesh` is set. This is in-line with other APIs which allow pspecs like einsum, reshape, etc PiperOrigin-RevId: 728392216	2025-02-18 15:47:03 -08:00
Yash Katariya	8bcbf585df	Make device_put resharding on single device array input work under use_mesh. Fixes https://github.com/jax-ml/jax/issues/26552 PiperOrigin-RevId: 728382461	2025-02-18 15:22:39 -08:00
Yash Katariya	00d8297071	[sharding_in_types] Set the `sharding_in_types` config to True. This is a purely internal change and shouldn't affect any public APIs. Some caveats of enabling sharding-in-types by default are that we'll see tracing cache misses which will lead to lowering cache miss and compilation cache misses in the following cases: (but persistent compilation cache is not affected so we'll see a cache hit there) 1. Call `jitted_f(arr_ns)` with an array on `NamedSharding` and again `jitted_f(arr_ps)` with an array of same shape and dtype but now with `PositionalSharding` * This leads to a tracing cache miss because on the second call, the aval has no sharding since it's PositionalSharding. This applies to calling with any sharding other than NamedSharding 2. `jitted_f = jit(f, in_shardings=ns)`. Call `jitted_f(sharded_arr)` and then on the second call you pass a numpy array `jitted_f(numpy_arr)` * This also leads to a cache miss because the avals currently don't look at in_shardings because the semantics of in_shardings is complicated and I don't think we should change the aval based on in_shardings. The solution in both cases is make sure to pass the array sharded on the same mesh during both calls to jit. PiperOrigin-RevId: 728361493	2025-02-18 14:35:14 -08:00
Yash Katariya	1dc58b79bf	Error unconditionally for jit, pjit and with_sharding_constraint if `use_mesh` and `with mesh` are used together. PiperOrigin-RevId: 728310200	2025-02-18 12:16:25 -08:00
Yash Katariya	15cd83ae00	[sharding_in_types] Error out when PartitionSpec is passed to APIs that take `out_sharding` like einsum when context_mesh is unset. This change is raising a better error because doing `NamedSharding(empty_mesh, P('x'))` will raise an error on construction but it is uglier than the current error added in this change. PiperOrigin-RevId: 726253654	2025-02-12 17:13:14 -08:00
Yash Katariya	1a62df1ac0	Rename `sharding` argument to `out_sharding` for `lax.reshape`, `lax.broadcast_in_dim`, `lax.broadcast` and `lax.broadcasted_iota`. `.bind` of these APIs still take `sharding` as a parameter though (but that's fine since it's internal and not public facing) PiperOrigin-RevId: 726187934	2025-02-12 13:59:23 -08:00
Yash Katariya	2d01df760b	[sharding_in_types] Make the typing checks and sharding rule checks a little bit less strict when the current or aval mesh is empty/unset. Also some more changes as listed below: * get_aval is not context dependent * canonicalization does not happen for avals on an empty mesh * jax.jit does not set abstract mesh context anymore before tracing * sharding checks have been relaxed for all modes (`Auto`, `Explicit` and `Manual`). This means that `f = lambda x, y: x * y; f(explicit_sharded_arr, np_array)` will be allowed without inserting any mesh_casts even in `Explicit` sharding mode * Even if use_mesh is not used in explicit sharding mode, computation follows data works! * Higher order primitives skip canonicalization (pjit_p, while_p, cond_p, for_loop_p, scan_p) * Check in partial_eval which compares jaxpr_known.outvars == jaxpr.out_avals has been relaxed to not check shardings if any one of the aval has an empty mesh. As mentioned in https://github.com/jax-ml/jax/issues/26474 we need to relax the typing and sharding rule checks because if we insert `mesh_cast`s, those lead to creation of unnecessary residuals (for literals, numpy arrays, basically anything that has an empty mesh) which is not good. PiperOrigin-RevId: 726097292	2025-02-12 10:03:01 -08:00
Yash Katariya	005c14b4da	[sharding_in_types] Error out if the sharding's specs passed to with_sharding_constraint don't refer to Auto axes. PiperOrigin-RevId: 725679220	2025-02-11 10:16:52 -08:00
Parker Schuh	da0827b7f1	Compute buffer aliasing on a per buffer basis. PiperOrigin-RevId: 723561674	2025-02-05 10:25:04 -08:00
Parker Schuh	cb188a0cb1	Reject invalid None in jax.NamedSharding(spec=None). PiperOrigin-RevId: 722500631	2025-02-02 21:29:33 -08:00
Yash Katariya	9107ee4a22	Do automatic casting from auto -> manual when the context mesh is manual and avals are in auto mode. This happens when values are being closed over in a shard_map. The casting is happening at lax level but we can move this to a different place later on. PiperOrigin-RevId: 721495804	2025-01-30 13:14:04 -08:00
Yash Katariya	f4e2c6c34c	Try to match out_spec with in_spec if both shardings are full auto and they are equivalent to each other. This is because of backwards compatibility reasons where tests expect the in and out shardings to match. PiperOrigin-RevId: 721470917	2025-01-30 11:59:57 -08:00
Yash Katariya	dcb28f1218	[sharding_in_types] Add vmap + explicit sharding support. The main changes are: * Track `explicit_mesh_axis` on `AxisData`. * Modify `unmapped_aval` to the the above explicit mesh axis and insert it into the right place in the sharding so out_shardings are correct. * Make `matchaxis` also handle shardings correctly * All mapped dimensions should be sharded the same way * spmd_axis_name and explicit sharded arrays cannot be used together * `out_shardings` parameter on `dot_general`, `broadcast_in_dim`, `reshape`, `reshard` and `mesh_cast` is handled correctly in presence of vmap. This should eventually help us get rid of `spmd_axis_name` from `vmap`. PiperOrigin-RevId: 721007659	2025-01-29 09:34:27 -08:00
Yash Katariya	8f248fe626	[sharding_in_types] Upstream changes from defaulting sharding_in_types config to True experiment. There aren't a lot of failures in TGP but we can atleast upstream these changes until we work on the failures. PiperOrigin-RevId: 720639755	2025-01-28 11:04:42 -08:00
Yash Katariya	ae705fef9c	[sharding_in_types] Add support for svd_p PiperOrigin-RevId: 720409750	2025-01-27 20:31:54 -08:00

1 2 3 4 5 ...

640 Commits