rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-17 04:16:07 +00:00

Author	SHA1	Message	Date
Dougal Maclaurin	c36e1f7c1a	Make trace dispatch purely a function of context rather than a function of both context and data. This lets us delete a lot of machinery for managing data-dependent tracing: levels, sublevels, post_process_call, new_base_main, custom_bind and so on. PiperOrigin-RevId: 691086496	2024-10-29 11:04:31 -07:00
Gunhyun Park	94440c74c8	Register acos primitive to lower to CHLO acos. Related: https://github.com/openxla/stablehlo/pull/2496 PiperOrigin-RevId: 689890774	2024-10-25 13:20:36 -07:00
jax authors	6f371212d9	Implements an alternate version of ragged_attention, wherein, the actual attention kernel itself is dense. Meaning, this kernel does not have the compute saving (@when wrapped kernel) or prefetch/index skipping (via index rewriting) as part of the kernel. Rather, the kernel is invoked with a Jumble (A ragged type representation) and pallas takes care of applying the correct work skipping and index rewriting. Performance wise, we should be at parity, although this has not yet been tested. Authoring wise, the new kernel is significantly smaller and simpler to write. A major known limitation of this approach, which we have a plan to fix, is the invariant that the `seq_len % grid_size == 0` - we plan to relax this limitation in following CLs. PiperOrigin-RevId: 689868468	2024-10-25 12:07:34 -07:00
Yash Katariya	34611be53d	Add sharding rules to some more primitives so that backward pass of minformer passes. There are a couple of changes here: * Handled transpose of `dot_general` correctly with shardings * Handled transpose of `reduce_sum` correctly with shardings * `ShapedArray.to_tangent_aval` now sets the sharding of the tangent (not handling unreduced yet). * `ConcreteArray.aval` correctly sets the sharding which is extracted from the `val` attribute. * (Paired with Dougal!) Added sharding rule for `reshape_p` only when singleton dims are added/removed. * Added sharding rule for `select_n_p` because it gets called during `jax.grad` of minformer. * Added `sharding` attribute to `broadcast_in_dim` because we need to provide the correct sharding to it during `full` and transpose of `reduce_sum`. PiperOrigin-RevId: 689837320	2024-10-25 10:35:25 -07:00
Yash Katariya	f8a1f02d6b	[sharding_in_types][Take 2] Add `out_type` argument to `einsum` and `dot_general` to allow specifying for the output type. Right now, it only accept a `NamedSharding` but in the future we can allow a polymorphic type of: `jax.ShapeDtypeStruct \| Sharding \| Layout`. Reverts 0b3f0e11fb0c37342b3c05ad5d53f3435b6ca44c PiperOrigin-RevId: 688663504	2024-10-22 13:10:43 -07:00
jax authors	0b3f0e11fb	Reverts ebb75db8a523150c48376d15391f84380a2bb110 PiperOrigin-RevId: 688477769	2024-10-22 03:29:32 -07:00
Yash Katariya	ebb75db8a5	[sharding_in_types] Add `out_type` argument to `einsum` and `dot_general` to allow specifying for the output type. Right now, it only accept a `NamedSharding` but in the future we can allow a polymorphic type of: `jax.ShapeDtypeStruct \| Sharding \| Layout`. PiperOrigin-RevId: 688399552	2024-10-21 22:23:53 -07:00
Yash Katariya	4db212d2c6	Add `_sharding` argument to broadcasted_iota as a private parameter which only works under sharding_in_types mode. This is required because `jax.nn.one_hot` calls into `broascasted_iota`. PiperOrigin-RevId: 687152343	2024-10-17 21:16:51 -07:00
Dan Foreman-Mackey	8361eb58e1	Activate the FFI implementation of SVD on GPU. Alongside activating this new implementation, this change adds a new `algorithm` parameter to `jax.lax.svd`. Previously the choice of algorithm was made based on heuristics in the lowering rule, but it probably also makes sense to expose an option for users to specify the algorithm explicitly because our heuristics are not very carefully optimized. This change updates the implementation of SVD in `lax` to use the FFI version which was added to jaxlib in https://github.com/jax-ml/jax/pull/23794. This comes with a few benefits: 1. When running on a CUDA platform, the 64-bit API will be used for the algorithm based on QR decomposition. (Note that it looks like the 64-bit API isn't available on ROCm.) This addresses part of the feature request in https://github.com/jax-ml/jax/issues/23413, although there's still work to do to port the rest of the GPU calls to the 64-bit API. 2. This implementation supports shape polymorphism in all dimensions with some caveats. By default, we do use some heuristics to based on the matrix sizes to select the algorithm that is used, and the three different algorithms (QR, Jacobi, and batched Jacobi) have sufficiently different behavior (QR returns V^H, whereas Jacobi returns V; batched Jacobi doesn't support `full_matrices=False`) that I couldn't work out a simple way to push this logic into the kernel. If the symbolic constraints are not sufficient to concretely determine the heuristics, we always use the QR algorithm. But, I've also exposed the algorithm selection in the user API, so it's possible to bypass the heuristics and get consistent behavior alongside shape polymorphism if needed. Besides these core changes, I removed the forward compatibility checks from the CPU lowering, since we're well outside of the forward compatibility window now. PiperOrigin-RevId: 687106965	2024-10-17 17:57:06 -07:00
Yash Katariya	3e634d9530	[sharding_in_types] Add lax.transpose sharding propagation rule PiperOrigin-RevId: 687094297	2024-10-17 17:08:04 -07:00
Yash Katariya	5df4878ad0	[sharding_in_types] Add reduce max, integer_pow and standard_unop sharding rules PiperOrigin-RevId: 687073144	2024-10-17 15:55:29 -07:00
Yash Katariya	e92e1191b3	[sharding_in_types] Add broadcast_in_dim rule. PiperOrigin-RevId: 687054181	2024-10-17 14:55:10 -07:00
Jake VanderPlas	de3191fab3	Cleanup: fix unused imports & mark exported names	2024-10-16 17:42:41 -07:00
Jake VanderPlas	284ca8bc01	Improve docs for lax.gather & lax.scatter	2024-10-15 16:42:44 -07:00
jax authors	1f0b5728a4	Add a memory saving index rewrite step to vmap with ragged inputs over pallas_call. The approach here is to add a new notion to jax, for ragged_prop. Ragged prop is useful for computing the dynamism/raggedness of an output, given a set of inputs. In the limit, if we decide that this is a useful property to have in jax as a first class citizen, we could fold the raggedness into the type system. At the moment, however, it is just a small set of rules implemented per op. PiperOrigin-RevId: 685827096	2024-10-14 14:01:42 -07:00
Yash Katariya	4be1e332f7	[sharding_in_types] Add constraints during lowering for dot_general and reduce_sum so that we can enforce the sharding we choose during tracing PiperOrigin-RevId: 685216047	2024-10-12 09:58:53 -07:00
Yash Katariya	5b8775dc2f	[sharding_in_types] Add sharding rule for reduce sum which is just drop the specs for the axis we are reducing over PiperOrigin-RevId: 685069065	2024-10-11 21:31:25 -07:00
Yash Katariya	18bc354305	[sharding_in_types] Add `dot_general` sharding rule. We only handle the simple cases and rely on xla to insert the collectives. Cases where we error * batch dimensions not having consistent sharding (ignore None) * contracting dimensions not having consistent sharding (ignore None) * lhs.mesh != rhs.mesh * if batch dimension and tensor dimension sharding match -> Error PiperOrigin-RevId: 684983567	2024-10-11 16:05:13 -07:00
Peter Hawkins	46f0a3eee7	Clone RandomAlgorithm into lax.py, instead of using the version from XLA. Change in preparation for removing HLO ops from the XLA Python bindings. In passing, also: * improve how the documentation of FftType renders. * remove some stale references to xla_client * remove the standard_translate rule, which is unused. PiperOrigin-RevId: 684892102	2024-10-11 11:03:15 -07:00
Yash Katariya	8ef41a6e14	[sharding_in_types] Normalize partition specs when creating avals so that P(None, None) and P() are treated as replicated and equivalent. Shardings on avals are always normalized. PiperOrigin-RevId: 684465123	2024-10-10 09:07:44 -07:00
Peter Hawkins	94abaf430e	Add lax.FftType. We had never provided a public name for the enum of FFT types; instead it was only known by a semi-private name (jax.lib.xla_client.FftType). Add a public name (jax.lax.FftType) and deprecate the private one. We define a new FftType IntEnum rather than trying to expose the one in xla_client. The xla_client definition was useful when building classic HLO, but we no longer do that so there's no reason we need to couple our type to XLA's type. PiperOrigin-RevId: 684447186	2024-10-10 08:07:35 -07:00
Yash Katariya	351187d9da	[sharding_in_types] Add support for nary ops to propagate sharding when 1 input is sharded and all others are replicated. PiperOrigin-RevId: 684289345	2024-10-09 21:24:37 -07:00
jax authors	0854dc24e8	Remove explicit_type argument from _nary_lower_hlo. PiperOrigin-RevId: 683395436	2024-10-07 18:01:59 -07:00
Dan Foreman-Mackey	28bbbf894f	Simplify and consolidate dot algorithm control in lax. In https://github.com/jax-ml/jax/pull/23574, we added a new `algorithm` parameter to `lax.dot_general` with the goal of giving users explicit control over the specific algorithm used to control dot product accumulation. When using this feature in real use cases, we have found that the API is both too conservative (it required the user to pass the appropriate input types) and too restrictive for common use cases. In this change, I simplify the API to bring it more in line with user expectations, and generalize it to support a broader range of use cases. The core change is to update the dot_general lowering rule to add explicit type casts to the inputs, making sure that they always have the appropriate storage types going into the `DotGeneral` StableHLO op. Before this change, some backends would implicitly cast for some algorithms (e.g. f32 -> bf16), but error for others. It seems more user friendly to include automatic casts in all cases where a specific algorithm is requested. Another change in behavior is to (if needed) cast the result of the `DotGeneral` op (which is defined by the algorithm's `accumulation_type`) to match the input types. This means that, regardless of the algorithm choice, the output type will match the value that a user would expect from past use of `lax.dot_general`. The `preferred_element_type` parameter can now be used to control the output type, even when an algorithm is selected. To summarize, the updated version of `dot_general` accepts _any_ input dtypes, and the output will always match the inputs (under the existing promotion rules if the LHS and RHS don't match) unless `preferred_element_type` is used to select a specific output type. The specified "algorithm" is now more of an implementation detail, rather than the defining feature of the API, and JAX will do whatever it can to satisfy the user's request. (If an algorithm is not supported on the current device, we will still get a compile time error.) With the above changes in mind, it's no longer really necessary to have a `transpose_algorithm` parameter, because we can now use the same algorithm for the backwards pass. For users who need to customize the algorithm on the backwards pass, that is still possible using `custom_vjp`. Given the above changes, @sbodenstein made the excellent point that we don't really need the `algorithm` parameter anymore: just accept `DotAlgorithm` inputs to `precision`. I think this is a really nice suggestion, so I have updated the interface to implement this. One minor negative of this approach is that `preferred_element_type` isn't a great name for what that parameter does when it is used in conjunction with an algorithm. In the long run, I'd like to rename this parameter, but keeping it as is for now seems like the best short term approach. PiperOrigin-RevId: 683302687	2024-10-07 13:21:34 -07:00
jax authors	6d2c8cf5de	Merge pull request #23656 from tchatow:fix-inv PiperOrigin-RevId: 683112267	2024-10-07 03:38:04 -07:00
Tom Natan	ed5ba633d4	Reverts 6cf09f8c24c67ff650b95d174501fff3cb59db0d PiperOrigin-RevId: 682440543	2024-10-04 13:56:27 -07:00
Dan Foreman-Mackey	67f24df740	Activate FFI implementation of symmetric Eigendecomposition. These kernels support shape polymorphism in all dimensions and no GPU is required during lowering. The kernels have been included in jaxlib for more than 3 weeks so we don't need to include any forward compatibility checks. PiperOrigin-RevId: 682415506	2024-10-04 12:38:26 -07:00
Peter Hawkins	d3f63a66b8	Remove code to support jaxlib <= 0.4.33.	2024-10-04 11:39:05 -04:00
Dan Foreman-Mackey	c0240764bc	Activate FFI implementation of the QR decomposition. As part of this change, I've added support and tests for shape polymorphism and export on CPU and GPU. The FFI kernels have been available in jaxlib for over 3 weeks already and they are included with the latest release of jaxlib on PyPI so we don't need to worry about the forward compatibility checks. With this in mind, I also removed the old lowering rules, but kept the backwards compatibility tests for now. PiperOrigin-RevId: 682312752	2024-10-04 07:27:11 -07:00
jax authors	f203a9fc9e	Add very simple batching support for ragged_dot. PiperOrigin-RevId: 682079947	2024-10-03 16:42:11 -07:00
Ayaka	ad78147183	[Docs] Add docstring for `RoundingMethod` Currently, the class only has "An enumeration." as the docstring when viewing the documentation, which is unhelpful for users. This PR adds class members, detailed descriptions and cross-references to the docstring to make it beautiful and informative. PiperOrigin-RevId: 681866947	2024-10-03 07:23:22 -07:00
Sergei Lebedev	4cf33c0239	Added `scatter_sub_p` The new primitive is used for in-place subtract and update. Closes #23933 PiperOrigin-RevId: 681754037	2024-10-03 00:27:31 -07:00
Adam Paszke	c9f946ef57	Only thread a discharged ref value through a cond when it changes in some branch Otherwise, we can simply pass it in as an argument, but we can avoid updating it since it will always remain constant. Both programs have equivalent semantics, but this one can be optimized better since it makes it more apparent that the cond does not actually modify a ref. PiperOrigin-RevId: 681482148	2024-10-02 09:29:07 -07:00
Blake Hechtman	ce21a12a07	[JAX] Make a one hot mode of take along axis. PiperOrigin-RevId: 681139055	2024-10-01 13:16:26 -07:00
Paweł Paruzel	6e9a53690c	Activate Hessenberg Decomposition to XLA's FFI Additionally, created a missing backward compatibility test for the old LAPACK kernels of Hessenberg Decomposition. PiperOrigin-RevId: 681047625	2024-10-01 09:20:06 -07:00
Christos Perivolaropoulos	84fc011e27	Introducing partial discharge rules and implementations for cond_p As things stand you can partially discharge a jaxpr with `discharge_state(should_discharge=[...])` but each equation is discharges all its arguments. This means that primitives like `scan_p` and `cond_p` discharge all references they refer to (no pun intended) regardless of whether the user asked for it. We provide a special discharge rule that is preferred to the normal one when present that allows the op to discharge only some of the references. This feature is especially useful for pallas kernels because contrary to all other contexts where jaxprs are expected to eventually be fully discharged, pallas kernels lower references all the way to the runtime as pointers or MLIR memrefs. Here we implement the partial discharge rule for `cond_p` and will implement it for others in due course. PiperOrigin-RevId: 681021324	2024-10-01 08:03:58 -07:00
Dan Foreman-Mackey	1a1e16abcc	Remove forward compatibility checks from lowering of LU decomposition. The forward compatibility window for these checks has passed so it is now safe to remove them. PiperOrigin-RevId: 680565099	2024-09-30 07:23:56 -07:00
Tom Natan	6cf09f8c24	Reverts eff00cc4499cfe3f3f24bafda6c1ecf908232ff3 PiperOrigin-RevId: 678756266	2024-09-25 10:33:53 -07:00
Dan Foreman-Mackey	bc1e1a0220	Add support for setting a dot product "algorithm" for lax.dot_general. The StableHLO spec has a new "algorithm" parameter that allows specifying the algorithm that is used to execute a matrix multiplication, and it can tune the trade-off between performance and computational cost. Historically, in JAX, the precision and preferred_element_type parameters have been used to expose some level of control, but their behavior is platform dependent and not sufficiently flexible for performance use cases. This change adds a new "algorithm" parameter to dot_general to add support for the new explicit API. This parameter can be a member of the `SupportedDotAlgorithm` `Enum` to use an algorithm that is known to be supported on at least some hardware. Otherwise, it can be specified using the `DotAlgorithm` data structure which exposes the full generality of the StableHLO spec. Transposition is supported using the `transpose_algorithm` argument. PiperOrigin-RevId: 678672686	2024-09-25 06:17:09 -07:00
Tom Natan	eff00cc449	[JAX] add support for gather/scatter batching dims following the new attributes in stablehlo. This change also uses the new batching dims for gather/scatter batching rules, to avoid concatenating the indices with iota. See https://github.com/openxla/stablehlo/pull/2259 PiperOrigin-RevId: 678649138	2024-09-25 04:53:11 -07:00
Matthew Johnson	0a73d74a4e	simplify conversion logic involving extended dtypes Previously, the idea was that we would use the `convert_element_type` primitive to cast to/from extended dtypes. Extended dtype rules specified `convert_from(dtype1, dtype2) -> bool` and `convert_to(dtype1, dtype2) -> bool` functions. They were meant to do something like indicate whether a convert_element_type was legal. But I'm not sure if they really made sense. The implementation was certainly buggy for non-scalar representation types (physical element types). This PR simplifies and fixes things: 1. Instead of overloading the `convert_element_type_p` primitive with more cases involving casts to/from extended dtypes, let's just have distinct `to_edtype_p` and `from_edtype_p` primitives, which can be much simpler. We still reuse the `jax.lax.convert_element_type` API function, so there's no API change to the few existing users who know about this stuff. 2. Instead of extended dtype rules including `convert_from`/`convert_to` functions with questionable semantics, let's only allow casts to/from the representation type, which is already specified by the rules' `physical_element_aval`. (Indeed that should be roughly _all_ we need, and this PR is just one step towards realizing that goal.) We still have a boolean `allow_conversion` on extended dtype rules just so we can handle the PRNGKey case, where we don't want to allow any casts. 3. Fix the conversion logic to handle non-scalar representation types (physical element types).	2024-09-25 00:10:01 +00:00
tchatow	520980171f	Fix jax.numpy.linalg.inv with shape polymorphism	2024-09-24 12:03:06 -07:00
jax authors	bceceabae0	Merge pull request #23812 from mattjj:custom-primal-tangent-dtype-helper PiperOrigin-RevId: 677269012	2024-09-21 13:50:55 -07:00
Matthew Johnson	43cc70b7a1	add jax.experimental.primal_tangent_dtype helper useful for constructing new dtypes which have a distinct tangent type (e.g. for quantization)	2024-09-21 20:35:20 +00:00
jax authors	886aa944fa	Merge pull request #23707 from jakevdp:stop-gradient-doc PiperOrigin-RevId: 676876785	2024-09-20 09:48:08 -07:00
Dan Foreman-Mackey	bc80ecbbe4	Remove forward compatibility checks from cholesky_update lowering. The forward compatibility window has ended and it should be safe to remove these checks. PiperOrigin-RevId: 676853740	2024-09-20 08:32:25 -07:00
Michael Hudgins	d4d1518c3d	Update references to the GitHub url in JAX codebase to reflect move from google/jax to jax-ml/jax PiperOrigin-RevId: 676843138	2024-09-20 07:52:33 -07:00
Dan Foreman-Mackey	56d0c695c9	Condition tan lowering on jaxlib version rather than forward compatibility mode. PiperOrigin-RevId: 676436269	2024-09-19 09:03:51 -07:00
Dougal Maclaurin	018189491b	Clean up and fix primal type to tangent type mapping This is part of the ["stackless"](#23299) change. I'm splitting it out into a separate PR because we need it for some work on sharding types. Changes: 1. Rename `at_least_vspace` to `to_tangent_type` since that's what we always meant by it. `at_least_vspace` was always a bad name (sorry!) but it makes even less sense when you can have a special tangent type for a primal types that's already a vector space itself. 2. Replace `Zero.from_value` with `Zero.from_primal_value`, which does the required primal-type-to-tangent-type conversion. 3. Add `to_tangent_type` calls in various other places they're missing. 4. Remove non-support for float0 in custom deriviatives? 5. [Optional, WIP] Reinstate some checks that had been skipped over, presumably because of these bugs. (We'll see how far I get with it. Might end up being a separate PR.) PiperOrigin-RevId: 676115753	2024-09-18 13:43:54 -07:00
Dan Foreman-Mackey	dbc03cf8e5	Re-land #23261 with appropriate compatibility checks. PiperOrigin-RevId: 676092618	2024-09-18 12:40:53 -07:00

1 2 3 4 5 ...

1520 Commits