rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-16 20:06:05 +00:00

Author	SHA1	Message	Date
Michael Hudgins	2e808f2836	Merge pull request #26279 from MichaelHudgins:tsan-resultstore PiperOrigin-RevId: 723918760	2025-02-06 14:55:57 +00:00
George Necula	abcaec7081	[better_errors] Add debug info to the Jaxprs formed for AD Following #26078 , we add debug info to more calls of lu.wrap_init.	2025-02-05 19:21:02 +02:00
jax authors	414449e142	Merge pull request #26078 from gnecula:debug_info_jaxpr PiperOrigin-RevId: 723151082	2025-02-04 10:54:26 -08:00
George Necula	d12aead696	[better_errors] Add debug info to more Jaxprs and WrappedFun (step 1) The plan is for all `core.Jaxpr` and `lu.WrappedFun` to carry non-None debug info. We change `lu.wrap_init` to construct the result paths thunk whenever it is passed a `debug_info`. The goal is to make sure that all `WrappedFun` have a debug info with result paths support. We change some calling conventions for internal functions to not pass along a separate debug_info if we have a `WrappedFun` or a `Jaxpr`. We obtain several improvements in presence of debug infos in debug_info_test.py	2025-02-04 10:02:35 +02:00
Yash Katariya	bc1a706688	[sharding_in_types] Add a canonicalize_value step before dispatching `bind` so that we can insert `mesh_cast`s under the following conditions: * When current_mesh is Manual and aval mesh is Auto * When current mesh is set and aval mesh is unset * Final style primitives skip this canonicalization and they are free to add it in their own `bind` method. * `mesh_cast` is skipped from this canonicalization to avoid recursion errors. This is required to make sure that after we hit abstract_eval rule and check_jaxpr, everything is properly typed in JAX's type system. `Auto` right now is a bit more permissive because we need to keep the current code at HEAD working but `Explicit` and `Manual` are very strict. PiperOrigin-RevId: 722868091	2025-02-03 18:00:19 -08:00
George Necula	c70de6deed	[better_errors] Merge the JaxprDebugInfo and TracingDebugInfo into core.DebugInfo Previously, we had two almost identical classes: `TracingDebugInfo` and `JaxprDebugInfo`. The only difference was that `TracingDebugInfo` had a thunk to return the result paths, while `JaxprDebugInfo` had the result paths resolved to a tuple. The separation of these types provided some clarity, but also led to code duplication and required conversions as the debugging info goes from `WrappedFun` to a `Jaxpr` and then to `WrappedFun` again.	2025-02-02 06:23:03 +02:00
Yash Katariya	9107ee4a22	Do automatic casting from auto -> manual when the context mesh is manual and avals are in auto mode. This happens when values are being closed over in a shard_map. The casting is happening at lax level but we can move this to a different place later on. PiperOrigin-RevId: 721495804	2025-01-30 13:14:04 -08:00
George Necula	32c98b9a76	[better_errors] Refactor more uses of pe.tracing_debug_info (part 3) We replace uses of `pe.tracing_debug_info` with with `api_util.tracing_debug_info`, which uses the actual args and kwargs, instead of `in_tree` to manufacture fake args and kwargs. This ends up being more accurate, especially for `arg_names`; see changes in debug_info_tests.py. This means that we have to construct the debug info further upstream, before flattening args. This will later help populate debug info in `WrappedFun` and `Jaxpr`. This is part 3 of a series (following #26097, #26099) for jit, pmap, checkify, and the custom_partitioning (the last few uses). In order to land this, I had to remove a safety check that the number of `arg_names` and `result_paths` in a Jaxpr's debug info match the number of Jaxpr invars and outvars, respectively. Additionally, I added two accessors `safe_arg_names` and `safe_result_paths` to ensure that the arg names and result paths match the expected length. These accessors return no-op results when the lengths are not as expected. From my testint, this happens only in Jaxprs that are not used for lowering, hence there is no actual user-visible change here. Simply, more internal Jaxprs are getting debug_info and in some cases the `arg_names` and `result_paths` are not correct. Still, this change is worth it because the `func_src_info` is the most useful part of the debug info (used for leaked tracers), and that is accurate. We will fix the `arg_names` and `result_paths` in a future change. One can see in the changes in debug_info_test.py the improvements in the user-visible debug info, including for `pjit` and `pmap` cases when it was wrong.	2025-01-30 07:40:05 +02:00
Yash Katariya	d223dfc3f7	Allow multiple meshes for avals but in that case, just use empty_abstract_mesh instead of enabling computation follows data only for Auto mode. PiperOrigin-RevId: 721224349	2025-01-29 20:47:34 -08:00
Yash Katariya	dcb28f1218	[sharding_in_types] Add vmap + explicit sharding support. The main changes are: * Track `explicit_mesh_axis` on `AxisData`. * Modify `unmapped_aval` to the the above explicit mesh axis and insert it into the right place in the sharding so out_shardings are correct. * Make `matchaxis` also handle shardings correctly * All mapped dimensions should be sharded the same way * spmd_axis_name and explicit sharded arrays cannot be used together * `out_shardings` parameter on `dot_general`, `broadcast_in_dim`, `reshape`, `reshard` and `mesh_cast` is handled correctly in presence of vmap. This should eventually help us get rid of `spmd_axis_name` from `vmap`. PiperOrigin-RevId: 721007659	2025-01-29 09:34:27 -08:00
Yash Katariya	8f248fe626	[sharding_in_types] Upstream changes from defaulting sharding_in_types config to True experiment. There aren't a lot of failures in TGP but we can atleast upstream these changes until we work on the failures. PiperOrigin-RevId: 720639755	2025-01-28 11:04:42 -08:00
Yash Katariya	ae705fef9c	[sharding_in_types] Add support for svd_p PiperOrigin-RevId: 720409750	2025-01-27 20:31:54 -08:00
Peter Hawkins	95cb0eb1c9	Optimize JaxprEqnContext context manager. * Implement the context manager as a context manager class, rather than using @contextlib.contextmanager. It turns out the contextlib contextmanagers are rather slow. * Fuse the four child context managers into a single context manager. This saves us a bunch of allocations. * While we are here, also simplify the xla_metadata context manager to avoid its dual representation of the current metadata. PiperOrigin-RevId: 719918121	2025-01-26 12:08:44 -08:00
Peter Hawkins	184aefa493	Optimize the set_xla_metadata context manager. Key idea: if the argument to the context manager is None, then we don't need to touch any context state. Also clean up the API by separating the "set a dict" from the "set kwargs" use cases. PiperOrigin-RevId: 719628089	2025-01-25 05:40:45 -08:00
Yash Katariya	d28c3fa409	Replace Hidden/Visible/Collective AxisTypes names with Auto/Explicit/Manual. PiperOrigin-RevId: 719561729	2025-01-24 23:21:13 -08:00
Yash Katariya	704b2e5fba	[sharding_in_types] Make `vmap` work with shard_map + pallas PiperOrigin-RevId: 718578207	2025-01-22 16:48:32 -08:00
Yash Katariya	23d360bded	Remove axis_name from unmapped_aval PiperOrigin-RevId: 718558713	2025-01-22 15:49:04 -08:00
Peter Hawkins	f4adcc650f	Set __slots__ on core.Trace subclasses. This is easy to do and makes field accesses on Trace classes slightly faster.	2025-01-22 16:17:54 -05:00
jax authors	e304e9ea16	Merge pull request #25992 from gnecula:debug_info_arg_names PiperOrigin-RevId: 718216003	2025-01-21 22:17:08 -08:00
George Necula	3f73f7b0eb	[better_errors] Ensure debug_info.arg_names is never None. Most places in the code assumed this already, but often that usage is error reporting code, which is not yet well tested. When we cannot get the `inspect.Signature` or when the args and kwargs do not match the signature, we generate the flattened argument names as: `args[0]`, `args[1]`, `kwargs['foo']`, ... Previously, in these cases we returned `arg_names` is None, and then the whole debug_info ended up being `None`, throwing away even available information. We also add support for `api_util.fun_sourceinfo` even for cases when the `fun.__code__` is not available. In those cases we used to say that `fun_sourceinfo` is `None`. Now, we use the string representation of `fun` to get the name of built-in functions, or we use "<unknown>".	2025-01-21 13:38:10 +01:00
Yash Katariya	d50d1e2c40	Don't allow users to query `tracer.sharding` even under sharding in types mode. Instead, users should do `tracer.aval.sharding` so that code behaves the same under jit and eager mode. PiperOrigin-RevId: 717638986	2025-01-20 15:12:47 -08:00
George Necula	dcf72b01f4	[better_errors] Improvements in propagation of debugging info Added some documentation for `TracingDebugInfo` (docstring, comments about `arg_names`, since it was not obvious to me that this would flatten the non-static arguments). Laying the ground for the unification of the old `api_util.debug_info` and `partial_eval.tracing_debug_info`: we rename the former to `api_util.tracing_debug_info`, we push inside the calls to `fun_sourceinfo` and `fun_signature` (which were done by the callers until now), and we rewrite the latter in terms of the former. We leave for a future PR the actual replacing of the latter with the former throughout. In the process of above, cleaned up the one case when `partial_eval.tracing_debug_info` received None for the `in_tree` and `out_tracer_thunk`. The function contained catch-all exception clauses to handle those, but doing so it masked other places where we fail to collect debug info due to programming mistakes. E.g., in one place we passed a `WrappedFun` instead of a `Callable`, resulting in missing debugging info. Added more type declarations. Added a `state_test` with a failure to track debugging information, manifested with a leaked tracer without function provenance. Fixing this in a subsequent PR.	2025-01-20 15:09:51 +01:00
Yash Katariya	49224d6cdb	Replace Auto/User/Collective AxisTypes names with Hidden/Visible/Collective. Replace `with set_mesh(mesh):` with `with use_mesh(mesh):` context manager Also expose `AxisTypes` and `use_mesh` into public API via `jax.sharding.AxisTypes` and `jax.sharding.use_mesh`. PiperOrigin-RevId: 716446406	2025-01-16 17:55:54 -08:00
Dougal	9fe553ca49	More linearization fixes	2025-01-15 10:27:21 -05:00
jax authors	ee724565bf	Merge pull request #25827 from gnecula:debug_info_2 PiperOrigin-RevId: 715407809	2025-01-14 09:12:37 -08:00
Yash Katariya	c72ed260fe	[sharding_in_types] Handle ShapeDtypeStruct inputs with sharding_in_types by registering the sharding on the aval properly created by SDS in it's pytype_aval_mapping. Also If we are running under full auto mode, don't error out if primitives don't have a sharding rule registered. PiperOrigin-RevId: 715383866	2025-01-14 08:03:50 -08:00
George Necula	b30df36d7d	[better_errors] Add debug_info to DynamicJaxprTrace and JaxprStackFrame This is part of a sequence of changes to ensure that the debugging information is propagated properly. Additional cleanup: * Rename `result_paths` to `result_paths_thunk` in `TracingDebugInfo` to clarify the difference from the similar field in `JaxprDebugInfo` * Added more type declarations	2025-01-14 13:49:18 +00:00
Yash Katariya	a817f532b4	[sharding_in_types] Introduce `auto_mode`, `user_mode`, `auto_mode_ctx` and `user_mode_ctx` as private APIs to make writing auto/user sharding in types code way easier and noise-free. These can be made public in the future under different names. PiperOrigin-RevId: 714169304	2025-01-10 14:14:25 -08:00
Yash Katariya	3848f0d2ac	[sharding_in_types] Functions like einsum, reshape, broadcast_in_dim, broadcasted_iota, convert_element_type and sharding_cast that take out_sharding as an argument in their signature should also allow `PartitionSpec` instead of just `NamedSharding` as an input. If PartitionSpec is passed, the mesh is read from the context. The primitives though take `NamedSharding` only. The conversion from `PartitionSpec` to `NamedSharding` happens above `.bind`. We also raise an error if `PartitionSpec` contain mesh axis names that are of type Auto or Collective for the above functions. PiperOrigin-RevId: 713352542	2025-01-08 11:11:16 -08:00
Yash Katariya	755d6cdad8	[sharding_in_types] Aval sharding under full auto mode should contain None and not UNCONSTRAINED because axis_types + pspec give the full picture. PiperOrigin-RevId: 713105375	2025-01-07 18:04:20 -08:00
George Necula	e87a2a5929	[shape_poly] Remove old non_negative support. This was deprecated in January 2024, replaced by `core_max_dim(..., 0)`. PiperOrigin-RevId: 712523579	2025-01-06 07:36:11 -08:00
Jake VanderPlas	ccc3a29537	Internal: use a single registry for abstractify APIs	2024-12-23 08:44:35 -08:00
Jake VanderPlas	c560f8e06c	Unify abstractify & shaped_abstractify rules	2024-12-20 04:28:19 -08:00
Jake VanderPlas	676070f4cd	Refactor: move shaped_abstractify to core	2024-12-18 19:14:46 -08:00
Jake VanderPlas	89a54a9e85	Re-land changes from https://github.com/jax-ml/jax/pull/25555 Reverts 25524abc67d82281e8a4093480637785c03a0150 PiperOrigin-RevId: 707679094	2024-12-18 15:02:54 -08:00
jax authors	25524abc67	Reverts b56dc63160eaccd7df05d03b1c38f804ff85f564 PiperOrigin-RevId: 707501925	2024-12-18 04:43:57 -08:00
Matthew Johnson	42ac4ca357	ref errors	2024-12-18 07:46:14 +00:00
Jake VanderPlas	3cecbf34f2	Remove core.concrete_aval and replace with abstractify	2024-12-17 18:18:25 -08:00
Jake VanderPlas	2518c6233e	Raise rather than return error	2024-12-17 11:20:55 -08:00
jax authors	0fa541972e	Merge pull request #25456 from jakevdp:xla-abstractify PiperOrigin-RevId: 707175097	2024-12-17 11:13:18 -08:00
Jake VanderPlas	2c722d9b13	Cleanup: toward merging core.concrete_aval & xla.abstractify	2024-12-17 09:27:00 -08:00
Yash Katariya	473e2bf527	Put abstract_mesh on every eqn so that we can preserve it during `eval_jaxpr` and `check_jaxpr` roundtrip. Also allow users to enter into `Auto`/`User` mode inside jit along all or some axes. Add checks to make sure that avals inside a context match the surrounding context. This check happens inside `abstract_eval` rules but maybe we need a more central place for it which we can create later on. PiperOrigin-RevId: 707128096	2024-12-17 09:17:21 -08:00
Yash Katariya	41f490aef4	[sharding_in_types] Default axis_types to `Auto` for all axis_names if user does not set any AxisType. Also resolve some TODOs now that we have a way for user to set the mesh. PiperOrigin-RevId: 704944255	2024-12-10 20:20:23 -08:00
Yash Katariya	b5e4fd161d	[sharding_in_types] Enforce AxisTypes to always exist if `set_mesh` is used. Also support `Auto` mode fully or mixed in with `User` mode. This works by overriding the sharding of `Auto` axes in the PartitionSpec with `Unconstrained` in `ShapedArray` constructor. The `ShapedArray` constructor is the central place where we can make such substitutions. During lowering of shardings with auto axes, we mark the auto dims are `unspecifed_dims`. We don't mark all dims as unspecified because that would enable XLA to shard them even further which is not what we want if some of the dims are user sharded. PiperOrigin-RevId: 704911253	2024-12-10 18:03:21 -08:00
Dougal	fc2edbfac8	Add a `freeze` primitive to delimit ref lifetimes for AD. Also some basic AD through mutable_array/freeze. Co-authored-by: Matthew Johnson <mattjj@google.com>	2024-12-09 20:57:07 -05:00
Yash Katariya	a735bf83e5	Simply abstract_mesh and device_context context managers and handle everything via their corresponding configs in config.py PiperOrigin-RevId: 702852769	2024-12-04 14:04:25 -08:00
Yash Katariya	0d2dfea4b1	Add a private `set_mesh` API to enter into sharding_in_types mode. This is how users will enable sharding in types mode (with correct axis types set too but that doesn't work yet). Also adding a device_context so `set_mesh` sets the devices the computation should run on correctly. The device_context however enters concrete devices into tracing and lowering cache but this should be fixed with the other jax context work going on. PiperOrigin-RevId: 700537898	2024-11-26 20:01:04 -08:00
Yash Katariya	355589f32b	[sharding_in_types] Add scan support to sharding_in_types. There are a couple of changes here * Set abstract_mesh context manager during pjit_p.bind at the top level too since scan builds jaxpr during it's lowering in `_scan_impl` (do the same for AOT path) * Set the abstract mesh only once if it's not set. Don't override an already set context. This means that only top level jit sets the context manager. * Add dynamic_slice and dynamic_update_slice sharding rules since scan calls into them. * scan only allows `xs` where the 0th dim is full replicated i.e. None. PiperOrigin-RevId: 699014167	2024-11-21 20:13:23 -08:00
jax authors	e707edeafa	Merge pull request #25034 from gnecula:poly_state PiperOrigin-RevId: 698820458	2024-11-21 09:57:55 -08:00
George Necula	0831e2e340	[shape_poly] Adding shape polymorphism support for the state primitives.	2024-11-21 06:17:01 -08:00

1 2 3 4 5 ...

261 Commits