rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-17 20:36:05 +00:00

Author	SHA1	Message	Date
Yash Katariya	abcc7fdf4c	[sharding_in_types] Initial commit to add `varying_manual_axes: frozenset[AxisName]` to ShapedArray. Also add `jax_varying_axes_in_types` config to hide this option under while we develop it. PiperOrigin-RevId: 736141670	2025-03-12 08:29:16 -07:00
Yash Katariya	3a26804c68	Rename `get_ty` to `typeof` which is an alias of `get_aval` PiperOrigin-RevId: 735946640	2025-03-11 17:34:44 -07:00
Matthew Johnson	7c2f842353	shard_map and other fixes to direct-linearize Co-authored-by: Dougal Maclaurin <dougalm@google.com>	2025-03-07 21:02:40 +00:00
Yash Katariya	f8b98993b8	Add a divisibility check so that we make sure that sharding evenly divides the shape (until this restriction is lifted) to make sure we don't create bad shardings. Also improve dynamic_update_slice sharding error by printing `aval.str_short()` instead of full sharding because it's concise and gives more info than the current error (i.e. it adds shape too to the error message) Also make some formatting changes in scan lowering to make it easier to debug. PiperOrigin-RevId: 734542862	2025-03-07 07:01:34 -08:00
Yash Katariya	e9486920e8	Auto complete specs in a sharding if aval.ndim > len(sharding.spec) with `None`. So that for a 2D input, P('data') continues to work. PiperOrigin-RevId: 734325209	2025-03-06 16:10:14 -08:00
George Necula	a6c47d6f36	Use the same name for aliased Vars when pretty-printing Jaxprs. Add a mechanism for using the same Var names for Vars that are aliased. In this PR, we use this for `pjit`, such that the following `print(jax.make_jaxpr(lambda a: jax.jit(lambda a: a + 1)(a))(0.))` prints: ``` { lambda ; a:f32[]. let b:f32[] = pjit[ name=<lambda> jaxpr={ lambda ; a:f32[]. let b:f32[] = add a 1.0 in (b,) } ] a in (b,) } ``` instead of the previous: ``` { lambda ; a:f32[]. let b:f32[] = pjit[ name=<lambda> jaxpr={ lambda ; c:f32[]. let d:f32[] = add c 1.0 in (d,) } ] a in (b,) } ``` The same mechanism could be used for other higher-order primitives, e.g., cond, and others. Also add some typing declarations and rename APIs to use "shared jaxpr" in lieu of "top-level jaxpr" for those Jaxprs that are used multiple times and are printed first. I presume that the term "top-level jaxpr" was picked because these are printed first at top-level. But this is confusing, because they are really subjaxprs. In fact, there was already a function `core.pp_toplevel_jaxpr` for printing the top-level Jaxpr, and there was also `core.pp_top_level_jaxpr` (which now is named `core.pp_shared_jaxpr`.	2025-03-03 11:38:51 +01:00
Yash Katariya	53494ade2d	`PRNGKeyArray.aval` should have the correct logical sharding. This required refactoring code so that we don't hit recursion errors. PiperOrigin-RevId: 732536521	2025-03-01 18:18:19 -08:00
Yash Katariya	177e1f6ed9	Canonicalize PartitionSpec so that we can delete ParsedPartitionSpec. We need to do this after sharding-in-types to speed up NamedSharding construction and remove a lot of tech debt and unnecessary complexity. * `_partitions` is now canonicalized and only contains `tuples`, `singular strings`, `None` or `UNCONSTRAINED`. No more empty tuples (`P((), 'x')`) and singleton tuples. * Cache the creating of sharding on ShapedArray since it's expensive to do it a lot of times * Change the `__hash__` and `__eq__` of `NamedSharding` to depend on `self.spec` instead of `self._parsed_pspec`. PiperOrigin-RevId: 731745062	2025-02-27 08:59:25 -08:00
Peter Hawkins	256e37af5f	Port many uses of contextlib.contextdecorator to explicit context manager classes. contextdecorator turns out to be slower than just writing a decorator class explicitly. Since we use many decorators per-equation, this causes a measurable speed difference in certain benchmarks. PiperOrigin-RevId: 730939406	2025-02-25 10:31:05 -08:00
Yash Katariya	9deb7e3d96	[sharding_in_types] `physical_aval` should set the correct sharding on `ShapedArray` so that lowering and compilation don't crash PiperOrigin-RevId: 730885084	2025-02-25 07:53:14 -08:00
Yash Katariya	6f8bab3c92	Add sharding mismatch to explain_tracing_cache_miss PiperOrigin-RevId: 730645598	2025-02-24 16:49:49 -08:00
George Necula	1be801bac8	[better_errors] Cleanup use of DebugInfo.arg_names and result_paths Previously, we represented a missing arg name with `None`, and a missing result path with the empty string. We now adopt the same convention for arg names and use empty strings. This simplifies the typing, and prevents the string "None" from appearing in error messages. I changed how we encode the result paths. Previously for a function that returns a single array the path was the empty string (the same as for an unknown path). And for a function that returns a pair of arrays it was `([0], [1])`. Now we add the "result" prefix: `("result",)` for a function returning a single array and `(result[0], result[1])` for a function returning a pair of arrays. Finally, in debug_info_test, I removed the `check_tracer_arg_name` so that all spied tracers are printed with the argument name they depend on.	2025-02-23 08:27:56 +02:00
Yash Katariya	262aab74f0	canonicalize closed over values if atleast 1 mesh axis is `Manual` and all other mesh axes are `Manual` or `Auto`. This would make the canonicalization work properly with shmap partial-auto. If a mesh axis is Explicit, we don't canonicalize closed over values yet since that make require shape changes. The workaround is for users to pass those arrays as arguments instead of closing over them in a shard_map. PiperOrigin-RevId: 728956512	2025-02-19 22:18:56 -08:00
Yash Katariya	8305803b76	[sharding_in_types] Initial support for partial-auto/explicit shard_map + sharding-in-types. If the axes in `shmap(..., auto=...)` is an explicit axes in the outer mesh context, then that axis is treated as Explicit instead of Auto. PiperOrigin-RevId: 728920514	2025-02-19 20:04:54 -08:00
Yash Katariya	a3edfb43ef	Now that sharding_in_types config flag is True, remove the config and all the conditionals PiperOrigin-RevId: 728653433	2025-02-19 06:53:35 -08:00
Yash Katariya	b35083331c	Expose `get_ty` aka get_aval from jax namespace PiperOrigin-RevId: 728490205	2025-02-18 21:22:19 -08:00
George Necula	a0812cd57e	[better_errors] Make it explicit that debug_info is not None. Now all internal uses of lu.wrap_init and core.Jaxpr are with actual debug info. This enables us to clean up the type declarations and to remove the checks whether debug_info is present. For usage outside of the JAX internals, we change `jax.extend.linear_util.wrap_init` to be usable without debug_info, for temporary backwards compatibility. We emit a deprecation warning and fill-in some fake debugging info. See https://github.com/jax-ml/jax/issues/26480 for more details. PiperOrigin-RevId: 726770483	2025-02-13 22:07:04 -08:00
jax authors	60dcded2af	Merge pull request #26518 from superbobry:maint-2 PiperOrigin-RevId: 726663977	2025-02-13 15:44:19 -08:00
Sergei Lebedev	a73456d54d	Removed unused ``# type: ignore`` comments For future reference, this can be done via python -m mypy jax --warn-unused-ignores > /tmp/unused.txt while IFS=: read file line rest; do echo "$file:$line"; gsed -i "${line}s/ \# type: ignore$\[[^]]\]$*//" "$file" done < /tmp/unused.txt	2025-02-13 21:12:27 +00:00
Yash Katariya	229aa65a3e	Split NamedSharding into a separate file called named_sharding.py so that we can import it in core.py and break the cyclic dependency. PiperOrigin-RevId: 726566863	2025-02-13 11:22:54 -08:00
Yash Katariya	3ec7a67e51	[sharding_in_types] Make `sharding` arg to ShapedArray kwarg only PiperOrigin-RevId: 726272943	2025-02-12 18:22:50 -08:00
Yash Katariya	d58c3a4722	[sharding_in_types] Fix some properties that assumed axis_types always existed. PiperOrigin-RevId: 726187278	2025-02-12 13:57:19 -08:00
Yash Katariya	2d01df760b	[sharding_in_types] Make the typing checks and sharding rule checks a little bit less strict when the current or aval mesh is empty/unset. Also some more changes as listed below: * get_aval is not context dependent * canonicalization does not happen for avals on an empty mesh * jax.jit does not set abstract mesh context anymore before tracing * sharding checks have been relaxed for all modes (`Auto`, `Explicit` and `Manual`). This means that `f = lambda x, y: x * y; f(explicit_sharded_arr, np_array)` will be allowed without inserting any mesh_casts even in `Explicit` sharding mode * Even if use_mesh is not used in explicit sharding mode, computation follows data works! * Higher order primitives skip canonicalization (pjit_p, while_p, cond_p, for_loop_p, scan_p) * Check in partial_eval which compares jaxpr_known.outvars == jaxpr.out_avals has been relaxed to not check shardings if any one of the aval has an empty mesh. As mentioned in https://github.com/jax-ml/jax/issues/26474 we need to relax the typing and sharding rule checks because if we insert `mesh_cast`s, those lead to creation of unnecessary residuals (for literals, numpy arrays, basically anything that has an empty mesh) which is not good. PiperOrigin-RevId: 726097292	2025-02-12 10:03:01 -08:00
jax authors	1a8d537728	Merge pull request #26384 from gnecula:debug_info_jaxpr_4 PiperOrigin-RevId: 725210049	2025-02-10 07:42:57 -08:00
George Necula	1e813e1693	[better_errors] Continue adding debug info to Jaxprs (step 4) This follows after #26078, #26313, #26348, adding `debug_info` to more calls to `lu.wrap_init`. As part of this I have changed the primitive `custom_transpose` to take the `transpose` parameter as a `lu.WrappedFun`, which carries debug info. Previously, this was a `Callable`. These changes ensure that all the `lu.wrap_init` and `Jaxpr` are called with debug_info in the `api_test.py:CustomTransposeTest`.	2025-02-08 09:13:55 +02:00
Yash Katariya	21e1be3320	Don't call `get_cur_mesh_sharding` if sharding-in-types mode is not enabled PiperOrigin-RevId: 724461150	2025-02-07 13:55:38 -08:00
Matthew Johnson	719031c1fd	[mutable-arrays] persist shardings through xla computations	2025-02-07 18:33:24 +00:00
jax authors	5d647ccfa1	Merge pull request #26348 from gnecula:debug_info_jaxpr_3 PiperOrigin-RevId: 723920031	2025-02-06 06:59:18 -08:00
Michael Hudgins	2e808f2836	Merge pull request #26279 from MichaelHudgins:tsan-resultstore PiperOrigin-RevId: 723918760	2025-02-06 14:55:57 +00:00
George Necula	904b74860c	[better_errors] Continue adding debug info to Jaxprs (step 3) This follows after #26078, and #26313, adding `debug_info` to more calls to `lu.wrap_init`. As part of this I have changed the primitives `custom_vjp_call_jaxpr` and `custom_lin` to take the `bwd` parameter as a `lu.WrappedFun`, which carries debug info. Previously, this was a `Callable`, but in almost all cases if was really ` lu.WrappedFun.call_wrapped`.	2025-02-06 16:26:49 +02:00
George Necula	abcaec7081	[better_errors] Add debug info to the Jaxprs formed for AD Following #26078 , we add debug info to more calls of lu.wrap_init.	2025-02-05 19:21:02 +02:00
jax authors	414449e142	Merge pull request #26078 from gnecula:debug_info_jaxpr PiperOrigin-RevId: 723151082	2025-02-04 10:54:26 -08:00
George Necula	d12aead696	[better_errors] Add debug info to more Jaxprs and WrappedFun (step 1) The plan is for all `core.Jaxpr` and `lu.WrappedFun` to carry non-None debug info. We change `lu.wrap_init` to construct the result paths thunk whenever it is passed a `debug_info`. The goal is to make sure that all `WrappedFun` have a debug info with result paths support. We change some calling conventions for internal functions to not pass along a separate debug_info if we have a `WrappedFun` or a `Jaxpr`. We obtain several improvements in presence of debug infos in debug_info_test.py	2025-02-04 10:02:35 +02:00
Yash Katariya	bc1a706688	[sharding_in_types] Add a canonicalize_value step before dispatching `bind` so that we can insert `mesh_cast`s under the following conditions: * When current_mesh is Manual and aval mesh is Auto * When current mesh is set and aval mesh is unset * Final style primitives skip this canonicalization and they are free to add it in their own `bind` method. * `mesh_cast` is skipped from this canonicalization to avoid recursion errors. This is required to make sure that after we hit abstract_eval rule and check_jaxpr, everything is properly typed in JAX's type system. `Auto` right now is a bit more permissive because we need to keep the current code at HEAD working but `Explicit` and `Manual` are very strict. PiperOrigin-RevId: 722868091	2025-02-03 18:00:19 -08:00
George Necula	c70de6deed	[better_errors] Merge the JaxprDebugInfo and TracingDebugInfo into core.DebugInfo Previously, we had two almost identical classes: `TracingDebugInfo` and `JaxprDebugInfo`. The only difference was that `TracingDebugInfo` had a thunk to return the result paths, while `JaxprDebugInfo` had the result paths resolved to a tuple. The separation of these types provided some clarity, but also led to code duplication and required conversions as the debugging info goes from `WrappedFun` to a `Jaxpr` and then to `WrappedFun` again.	2025-02-02 06:23:03 +02:00
Yash Katariya	9107ee4a22	Do automatic casting from auto -> manual when the context mesh is manual and avals are in auto mode. This happens when values are being closed over in a shard_map. The casting is happening at lax level but we can move this to a different place later on. PiperOrigin-RevId: 721495804	2025-01-30 13:14:04 -08:00
George Necula	32c98b9a76	[better_errors] Refactor more uses of pe.tracing_debug_info (part 3) We replace uses of `pe.tracing_debug_info` with with `api_util.tracing_debug_info`, which uses the actual args and kwargs, instead of `in_tree` to manufacture fake args and kwargs. This ends up being more accurate, especially for `arg_names`; see changes in debug_info_tests.py. This means that we have to construct the debug info further upstream, before flattening args. This will later help populate debug info in `WrappedFun` and `Jaxpr`. This is part 3 of a series (following #26097, #26099) for jit, pmap, checkify, and the custom_partitioning (the last few uses). In order to land this, I had to remove a safety check that the number of `arg_names` and `result_paths` in a Jaxpr's debug info match the number of Jaxpr invars and outvars, respectively. Additionally, I added two accessors `safe_arg_names` and `safe_result_paths` to ensure that the arg names and result paths match the expected length. These accessors return no-op results when the lengths are not as expected. From my testint, this happens only in Jaxprs that are not used for lowering, hence there is no actual user-visible change here. Simply, more internal Jaxprs are getting debug_info and in some cases the `arg_names` and `result_paths` are not correct. Still, this change is worth it because the `func_src_info` is the most useful part of the debug info (used for leaked tracers), and that is accurate. We will fix the `arg_names` and `result_paths` in a future change. One can see in the changes in debug_info_test.py the improvements in the user-visible debug info, including for `pjit` and `pmap` cases when it was wrong.	2025-01-30 07:40:05 +02:00
Yash Katariya	d223dfc3f7	Allow multiple meshes for avals but in that case, just use empty_abstract_mesh instead of enabling computation follows data only for Auto mode. PiperOrigin-RevId: 721224349	2025-01-29 20:47:34 -08:00
Yash Katariya	dcb28f1218	[sharding_in_types] Add vmap + explicit sharding support. The main changes are: * Track `explicit_mesh_axis` on `AxisData`. * Modify `unmapped_aval` to the the above explicit mesh axis and insert it into the right place in the sharding so out_shardings are correct. * Make `matchaxis` also handle shardings correctly * All mapped dimensions should be sharded the same way * spmd_axis_name and explicit sharded arrays cannot be used together * `out_shardings` parameter on `dot_general`, `broadcast_in_dim`, `reshape`, `reshard` and `mesh_cast` is handled correctly in presence of vmap. This should eventually help us get rid of `spmd_axis_name` from `vmap`. PiperOrigin-RevId: 721007659	2025-01-29 09:34:27 -08:00
Yash Katariya	8f248fe626	[sharding_in_types] Upstream changes from defaulting sharding_in_types config to True experiment. There aren't a lot of failures in TGP but we can atleast upstream these changes until we work on the failures. PiperOrigin-RevId: 720639755	2025-01-28 11:04:42 -08:00
Yash Katariya	ae705fef9c	[sharding_in_types] Add support for svd_p PiperOrigin-RevId: 720409750	2025-01-27 20:31:54 -08:00
Peter Hawkins	95cb0eb1c9	Optimize JaxprEqnContext context manager. * Implement the context manager as a context manager class, rather than using @contextlib.contextmanager. It turns out the contextlib contextmanagers are rather slow. * Fuse the four child context managers into a single context manager. This saves us a bunch of allocations. * While we are here, also simplify the xla_metadata context manager to avoid its dual representation of the current metadata. PiperOrigin-RevId: 719918121	2025-01-26 12:08:44 -08:00
Peter Hawkins	184aefa493	Optimize the set_xla_metadata context manager. Key idea: if the argument to the context manager is None, then we don't need to touch any context state. Also clean up the API by separating the "set a dict" from the "set kwargs" use cases. PiperOrigin-RevId: 719628089	2025-01-25 05:40:45 -08:00
Yash Katariya	d28c3fa409	Replace Hidden/Visible/Collective AxisTypes names with Auto/Explicit/Manual. PiperOrigin-RevId: 719561729	2025-01-24 23:21:13 -08:00
Yash Katariya	704b2e5fba	[sharding_in_types] Make `vmap` work with shard_map + pallas PiperOrigin-RevId: 718578207	2025-01-22 16:48:32 -08:00
Yash Katariya	23d360bded	Remove axis_name from unmapped_aval PiperOrigin-RevId: 718558713	2025-01-22 15:49:04 -08:00
Peter Hawkins	f4adcc650f	Set __slots__ on core.Trace subclasses. This is easy to do and makes field accesses on Trace classes slightly faster.	2025-01-22 16:17:54 -05:00
jax authors	e304e9ea16	Merge pull request #25992 from gnecula:debug_info_arg_names PiperOrigin-RevId: 718216003	2025-01-21 22:17:08 -08:00
George Necula	3f73f7b0eb	[better_errors] Ensure debug_info.arg_names is never None. Most places in the code assumed this already, but often that usage is error reporting code, which is not yet well tested. When we cannot get the `inspect.Signature` or when the args and kwargs do not match the signature, we generate the flattened argument names as: `args[0]`, `args[1]`, `kwargs['foo']`, ... Previously, in these cases we returned `arg_names` is None, and then the whole debug_info ended up being `None`, throwing away even available information. We also add support for `api_util.fun_sourceinfo` even for cases when the `fun.__code__` is not available. In those cases we used to say that `fun_sourceinfo` is `None`. Now, we use the string representation of `fun` to get the name of built-in functions, or we use "<unknown>".	2025-01-21 13:38:10 +01:00
Yash Katariya	d50d1e2c40	Don't allow users to query `tracer.sharding` even under sharding in types mode. Instead, users should do `tracer.aval.sharding` so that code behaves the same under jit and eager mode. PiperOrigin-RevId: 717638986	2025-01-20 15:12:47 -08:00

1 2 3 4 5 ...

290 Commits