rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-16 03:46:06 +00:00

Author	SHA1	Message	Date
Peter Hawkins	62e66b684b	Don't monkey-patch functions in test_utils to count events for tests. This has two problems: * it's not thread-safe, which will become problematic if we run tests with thread-parallelism. * it's not very maintainable. Instead, add a new util.test_event(...) function that can be called at points of interest in the program. test_utils registers a callback that is invoked when an event is received. This avoids the need to make thread-unsafe global monkey patches.	2024-12-12 09:58:14 -05:00
Yash Katariya	41f490aef4	[sharding_in_types] Default axis_types to `Auto` for all axis_names if user does not set any AxisType. Also resolve some TODOs now that we have a way for user to set the mesh. PiperOrigin-RevId: 704944255	2024-12-10 20:20:23 -08:00
Dougal	fc2edbfac8	Add a `freeze` primitive to delimit ref lifetimes for AD. Also some basic AD through mutable_array/freeze. Co-authored-by: Matthew Johnson <mattjj@google.com>	2024-12-09 20:57:07 -05:00
Matthew Johnson	6172a1f1d5	remove vestigial ad.reducing_transposes table these were an xmap / avals-with-names named axis thing, but that stuff is gone so we can simplify	2024-12-05 05:44:40 +00:00
Yash Katariya	a735bf83e5	Simply abstract_mesh and device_context context managers and handle everything via their corresponding configs in config.py PiperOrigin-RevId: 702852769	2024-12-04 14:04:25 -08:00
Yash Katariya	9e2708eb57	[sharding_in_types] Use `set_mesh` API to trigger sharding_in_types instead of the config option. PiperOrigin-RevId: 702814257	2024-12-04 12:12:29 -08:00
Yash Katariya	653f65452d	Fix the broken behavior of not resetting the abstract_mesh and device_context properly during `__exit__`. PiperOrigin-RevId: 702762477	2024-12-04 09:59:23 -08:00
jax authors	c2c177eee8	[AutoPGLE] Update fdo_profile comment. PiperOrigin-RevId: 700759386	2024-11-27 11:24:10 -08:00
Yash Katariya	0d2dfea4b1	Add a private `set_mesh` API to enter into sharding_in_types mode. This is how users will enable sharding in types mode (with correct axis types set too but that doesn't work yet). Also adding a device_context so `set_mesh` sets the devices the computation should run on correctly. The device_context however enters concrete devices into tracing and lowering cache but this should be fixed with the other jax context work going on. PiperOrigin-RevId: 700537898	2024-11-26 20:01:04 -08:00
Yash Katariya	6763fcfb4e	Fix a weird interaction with `set_local` and empty tuples passed to it. PiperOrigin-RevId: 700392735	2024-11-26 10:50:05 -08:00
Yash Katariya	627debc78b	Create a `null_mesh_context` internal context manager to handle null contexts properly. PiperOrigin-RevId: 700167406	2024-11-25 18:32:05 -08:00
Yash Katariya	deab6fbd80	Remove _pjit_lower_cached cache. We can simplify the caching of jit as we have downstream caches and a cpp cache too. If you drop out of cpp cache, things are going to be slow anyways. PiperOrigin-RevId: 700052522	2024-11-25 11:40:50 -08:00
Dougal	b1d1dcf607	Add linearization rule for pjit_p	2024-11-22 14:24:46 -08:00
Yash Katariya	355589f32b	[sharding_in_types] Add scan support to sharding_in_types. There are a couple of changes here * Set abstract_mesh context manager during pjit_p.bind at the top level too since scan builds jaxpr during it's lowering in `_scan_impl` (do the same for AOT path) * Set the abstract mesh only once if it's not set. Don't override an already set context. This means that only top level jit sets the context manager. * Add dynamic_slice and dynamic_update_slice sharding rules since scan calls into them. * scan only allows `xs` where the 0th dim is full replicated i.e. None. PiperOrigin-RevId: 699014167	2024-11-21 20:13:23 -08:00
Yash Katariya	40fc6598f9	[sharding_in_types] Make flash_attention forward pass in TPU pallas work nicely with sharding in types. Backward pass is still busted which I will fix in follow up CLs. Set the abstract mesh context manager at the jit tracing boundary by looking at the mesh on the avals. In the future, this context manager will be user settable too. Abstract mesh context manager is a new context manager with a new context variable and new trace_context entry which governs the cache behavior. If the abstract mesh context manager is not set, the default is `None`. PiperOrigin-RevId: 698493184	2024-11-20 13:07:30 -08:00
James Martens	310ff7347c	Change to internal dead code elimination. Now the functions in `dce_rules` are responsible for checking if the equation has no used outputs or effects, and behaving appropriately in that case (which usually means eliminating said equation). PiperOrigin-RevId: 695789033	2024-11-12 10:37:04 -08:00
Dougal Maclaurin	64fcb9d3e9	Fix pgle profiling, broken in previous change. PiperOrigin-RevId: 695762690	2024-11-12 09:25:27 -08:00
Dougal	763952a607	Fix buggy and confusing logic in the C++/pjit caching path. When we have a cache miss in `_cpp_pjit` we want to compile the function and store the executable. Previously we had a roundabout way of getting hold of that executable. We'd trace the function to a jaxpr but we wouldn't lower and compile it ourselves. Instead, we'd call `pjit_p.bind`. The layers of the tracing onion would be peeled off and eventually we'd hit the `pjit_p` impl rule, `_pjit_call_impl`. This rule has its own cache. With luck we'd also miss that cache, and then `_pjit_call_impl` would lower and compile the jaxpr and store the executable in `most_recent_pjit_call_executable`. We'd eventually pop the stack back up to the `_cpp_pjit` cache miss and then we'd get hold of the compiled object by looking up `most_recent_pjit_call_executable`. There's room for bugs here if we hit one cache but not the other. For example, if we miss the `_cpp_pjit` cache but we hit the `_pjit_call_impl` cache then we won't compile the executable. Normally that would just mean that the `_cpp_pjit` cache won't be populated. But if we've previously hit a function with the same jaxpr but slightly different compilation parameters (e.g. device IDs) then we'll get a bogus hit in `most_recent_call_exectuable` and we'll add an incorrect cache entry. The divergent cache behavior you need to trigger this started happening with the "stackless" change because the tracing context became a bigger part of the cache key and `_cpp_pjit` and `_pjit_call_impl` will in general have different tracing contexts. With this change, we remove the whole `most_recent_pjit_call_executable` system. Instead `_cpp_pjit` lowers, compiles and runs the jaxpr itself and obtains the executable directly rather than calling into `pjit_p.bind`. We do call into `pjit_p.bind` if we're not in an eval context, but in that case we don't expect to be able to populate the `_cpp_pjit` cache anyway.	2024-11-11 00:42:47 -05:00
Yash Katariya	fff33f90b2	Add `compiler_options` argument to `jax.jit`. This exists on `Compiled` object via AOT too i.e. `jit(f).lower(*args).compile(compiler_options={})` PiperOrigin-RevId: 692283964	2024-11-01 14:01:19 -07:00
Sergei Lebedev	bdf2ca10fc	Removed more dead code from various submodules PiperOrigin-RevId: 691342832	2024-10-30 02:41:53 -07:00
Dougal Maclaurin	c36e1f7c1a	Make trace dispatch purely a function of context rather than a function of both context and data. This lets us delete a lot of machinery for managing data-dependent tracing: levels, sublevels, post_process_call, new_base_main, custom_bind and so on. PiperOrigin-RevId: 691086496	2024-10-29 11:04:31 -07:00
Yash Katariya	987dfaef1c	Raise a better error message if `None` is passed to with_sharding_constraint. PiperOrigin-RevId: 690672618	2024-10-28 10:51:11 -07:00
jax authors	6f371212d9	Implements an alternate version of ragged_attention, wherein, the actual attention kernel itself is dense. Meaning, this kernel does not have the compute saving (@when wrapped kernel) or prefetch/index skipping (via index rewriting) as part of the kernel. Rather, the kernel is invoked with a Jumble (A ragged type representation) and pallas takes care of applying the correct work skipping and index rewriting. Performance wise, we should be at parity, although this has not yet been tested. Authoring wise, the new kernel is significantly smaller and simpler to write. A major known limitation of this approach, which we have a plan to fix, is the invariant that the `seq_len % grid_size == 0` - we plan to relax this limitation in following CLs. PiperOrigin-RevId: 689868468	2024-10-25 12:07:34 -07:00
Jake VanderPlas	8948e6de58	sharding cleanup: use inline checks for unimplemented and auto	2024-10-25 04:22:40 -07:00
Jake VanderPlas	849850216d	fix mypy error	2024-10-22 11:10:10 -07:00
Sergei Lebedev	3ad1985e1a	Bumped mypy and ruff versions used by pre-commit	2024-10-21 21:58:41 +01:00
Sergei Lebedev	ec745f48c8	Use the current minimum jaxlib version for type checking on the CI	2024-10-10 12:46:15 +01:00
Yash Katariya	a9e9f97f00	Use no_tracing config in `_create_pjit_jaxpr` to so that AOT path can also error if we re-trace. PiperOrigin-RevId: 683392069	2024-10-07 17:49:09 -07:00
Peter Hawkins	d3f63a66b8	Remove code to support jaxlib <= 0.4.33.	2024-10-04 11:39:05 -04:00
Jake VanderPlas	a44e129ae7	Add more informative error when static argument is passed to non-static JIT parameter	2024-09-24 05:22:18 -07:00
Michael Hudgins	d4d1518c3d	Update references to the GitHub url in JAX codebase to reflect move from google/jax to jax-ml/jax PiperOrigin-RevId: 676843138	2024-09-20 07:52:33 -07:00
Yash Katariya	c9bbf71ec6	Cleanup `ParsedPartitionSpec` and remove `CanonicalizedParsedPartitionSpec`. Also mark `user_spec` as private. PiperOrigin-RevId: 676498946	2024-09-19 11:38:48 -07:00
Yash Katariya	8b5b71750b	Fix jaxpr equation context propagation in jaxpr equations when `inline=True`. PiperOrigin-RevId: 675754808	2024-09-17 16:40:36 -07:00
Parker Schuh	86fe463ad7	[Take 2] Generalize global jit cpp cache keys so we can add more keys than the current donate_argnums. This allows us to get more cache hits globally. For example: Before: jax.jit(f, out_shardings=s)(arr) jax.jit(f, out_shardings=s)(arr) # cpp cache miss After: jax.jit(f, out_shardings=s)(arr) jax.jit(f, out_shardings=s)(arr) # cpp cache hit Reverts b615266175effe4aefeb903620a19f3719a604da PiperOrigin-RevId: 675746175	2024-09-17 16:11:28 -07:00
Yash Katariya	634fbb5bec	Move `DeviceAssignmentMismatchError` exception catching code to `def lower` method of `Traced` so that all libraries calling `traced.lower()` see a better error message PiperOrigin-RevId: 674095608	2024-09-12 19:04:07 -07:00
Yash Katariya	3d1d5e94ab	Remove the device assignment check in _resolve_in_shardings since that's historical and not needed anymore PiperOrigin-RevId: 674091716	2024-09-12 18:48:15 -07:00
Yash Katariya	b615266175	Reverts 82c9da020a78997862a8f7ccd494bed363f7ed01 PiperOrigin-RevId: 668969133	2024-08-29 09:43:19 -07:00
Yash Katariya	dd6f0e2e2e	Add weak_type to ShapeDtypeStruct because jax.Array also has it and SDS is a duck of jax.Array This fixes a tracing cache miss issue when you eval shape with a weak_type input and get a strong type output back and pass that back in leading to a cache miss. Fixes: https://github.com/google/jax/issues/23302 PiperOrigin-RevId: 668949430	2024-08-29 08:35:42 -07:00
Yash Katariya	ef33cf5ace	Standardize default layout to `None` in internals (dispatch, lowering and compilation) and non-default layouts to concrete layouts. This massively simplifies the amount of checks we need and improves dispatch time too. It also fixes a donation bug being hit in serving code related to layouts and non-standardization of default layout in JAX. PiperOrigin-RevId: 668527139	2024-08-28 11:06:37 -07:00
Yash Katariya	46957052c5	Don't share the same global jit cpp cache between jit and pjit PiperOrigin-RevId: 668503956	2024-08-28 10:13:41 -07:00
Yash Katariya	afff0e09aa	Improve the error message to specify shapes too PiperOrigin-RevId: 668117141	2024-08-27 13:30:55 -07:00
Matthew Johnson	670a648b7b	add experimental jax.no_tracing context manager	2024-08-23 21:21:55 +00:00
Yash Katariya	82c9da020a	Generalize global jit cpp cache keys so we can add more keys than the current donate_argnums. This allows us to get more cache hits globally. For example: Before: ``` jax.jit(f, out_shardings=s)(arr) jax.jit(f, out_shardings=s)(arr) # cpp cache miss ``` After: ``` jax.jit(f, out_shardings=s)(arr) jax.jit(f, out_shardings=s)(arr) # cpp cache hit ``` Also, we can remove the hack (which I didn't like) in multihost_utils.py. PiperOrigin-RevId: 665574475	2024-08-20 16:18:58 -07:00
Yash Katariya	1ab6279d4f	Skip the global jit cpp cache if in/out_layouts are not None PiperOrigin-RevId: 665085182	2024-08-19 18:43:23 -07:00
Yash Katariya	6e1c23610d	If input layouts are specified via `in_shardings` to `jit` and the array that the jitted function is called with is uncommitted, reshard the input array to the layout specified by the user. Not doing the resharding, leads to incorrect outputs on GPU and a crash on TPU which is not good. Fixes: https://github.com/google/jax/issues/23100 PiperOrigin-RevId: 665000157	2024-08-19 15:10:32 -07:00
Yash Katariya	daa69da321	Introduce `jax.sharding.AbstractMesh(shape_tuple: tuple[tuple[str, int], ...])` and allow `with_sharding_constraint` and `shard_map` to accept an abstract mesh as input (`with_sharding_constraint` is via `NamedSharding(abstract_mesh, pspec)`). Semantics Inside jit, we don't need to talk about concrete devices ever so the semantics stay the same as today i.e. we can lower a NamedSharding with abstract mesh with only mesh axis names and sizes and PartitionSpec. The only restriction is that the number of devices need to be consistent throughout the program when we are tracing. During compilation, the order of devices throughout the program needs to be consistent (same as before this change). Outside jit i.e. eager mode, if a `shard_map` or `with_sharding_constraint` contains AbstractMesh, then the input to those primitives should contain a concrete Mesh with the same shape and names as the abstract mesh. Why do this? There are cases, where you want the change the devices in the mesh but keep the mesh shape the same (axis names and axis sizes). But this leads to a device mismatch error if you have `with_sharding_constraint` or `shard_map` in your computation because they embed concrete devices in their signature. So to fix the error, you need to change the mesh in `wsc` and `shmap` which will lead to a tracing cache miss (because function id is now different) and consequently a lowering to stableHLO cache miss. Explaining via an example: ``` mesh1 = Mesh(jax.devices()[:2], 'x') mesh2 = Mesh(jax.devices()[2:4], 'x') arr_mesh1 = jax.device_put(np.arange(8), NamedSharding(mesh1, P())) arr_mesh2 = jax.device_put(np.arange(8), NamedSharding(mesh2, P())) @jax.jit def f(x): y = with_sharding_constraint(x, NamedSharding(mesh1, P('x'))) return y * 2 f(arr_mesh1) f(arr_mesh2) # DEVICE MISMATCH ERROR! ``` The same problem exists for `shard_map` since it takes a mesh with concrete devices in it's signature. Okay, so how do you fix this? As mentioned above, we need the above program to work and get tracing and lowering cache hits (cache hits is the most important part here) The approach in this change, allows `with_sharding_constraint` to accept a `NamedSharding(abstract_mesh, pspec)` as input. This leads to no errors downstream and we get tracing and lowering cache hits since we don't encode the concrete devices anymore. Just the axis_names and axis_size of the mesh. The important part is that the concrete device information should only come from the arguments. Inside `jax.jit`, you should never reference concrete devices ever. ``` mesh1 = Mesh(jax.devices()[:2], 'x') mesh2 = Mesh(jax.devices()[2:4], 'x') arr_mesh1 = jax.device_put(np.arange(8), NamedSharding(mesh1, P())) arr_mesh2 = jax.device_put(np.arange(8), NamedSharding(mesh2, P())) # Creating abstract mesh with mesh1 but since both meshes have the same shape (names # and axis size), it should be ok. abstract_mesh = jax.sharding.AbstractMesh(arr_mesh1.shape_tuple) @jax.jit def f(x): y = with_sharding_constraint(x, NamedSharding(abstract_mesh, P('x'))) return y * 2 f(arr_mesh1) f(arr_mesh2) # tracing and lowering cache hit ``` One caveat is that this only works with `jax.NamedSharding` but that's fine because `NamedSharding` is the most used `Sharding` in JAX. What about `shard_map`? shard_map's signature will be: `shmap(f, mesh: Mesh \| AbstractMesh, in_specs: Specs, out_specs: Specs)`. ``` mesh1 = Mesh(jax.devices()[:2], 'x') mesh2 = Mesh(jax.devices()[2:4], 'x') arr_mesh1 = jax.device_put(np.arange(8), NamedSharding(mesh1, P())) arr_mesh2 = jax.device_put(np.arange(8), NamedSharding(mesh2, P())) # Creating abstract mesh with mesh1 but since both meshes have the same shape (names # and axis size), it should be ok. abstract_mesh = jax.sharding.AbstractMesh(arr_mesh1.shape_tuple) @jax.jit def f(x): y = shard_map(lambda x: x, mesh=abstract_mesh, in_specs=P('x'), out_specs=P('x')) return y * 2 f(arr_mesh1) f(arr_mesh2) # tracing and lowering cache hit ``` This is a fully backwards change. So your current code will continue to work as is but you can opt-into this new behavior and get all the benefits! PiperOrigin-RevId: 662670932	2024-08-13 15:18:08 -07:00
Yash Katariya	958234a9c1	Thread the mesh context manager to the place where we recover out_shardings back from GSPMDShardings. Before if you had a program like this: ``` with mesh: out = pjit(lambda: 1)() ``` The sharding of `out` was a `GSPMDSharding` which is not ideal. This change fixes that and returns a `NamedSharding` instead. This is also required for `Shardy` integration. PiperOrigin-RevId: 658842350	2024-08-02 11:04:48 -07:00
Yash Katariya	30037547d7	Bump minimum jaxlib version to 0.4.31. The corresponding xla_extension_version is 279 and mlir_api_version is 57 PiperOrigin-RevId: 657400413	2024-07-29 18:44:31 -07:00
Yash Katariya	2eb1888c98	Make the vmap(jit) or vmap(wsc) with a concrete layout error more informative PiperOrigin-RevId: 656176702	2024-07-25 18:32:37 -07:00
Ram Rachum	0d92d31063	Show elapsed time in nanoseconds	2024-07-25 22:20:25 +03:00

1 2 3 4 5 ...

366 Commits