1747 Commits

Author SHA1 Message Date
Matthew Johnson
66a6eb299e add autodiff rules for jax.lax.ragged_all_to_all collective
also update the ragged_all_to_all docstring. pseudocode in the style of the shard_map tutorial would be better and cleaner, but it needs the context of the tutorial to explain; i'll add ra2a to the shmap tutorial in the future.

PiperOrigin-RevId: 735957604
2025-03-11 18:22:02 -07:00
Yash Katariya
f45cbf3342 Fix a bug where full and use_mesh outside jit did not work because the shard passed to make_array_from_callback was sharded on all devices instead of just 1 device.
This is because `convert_element_type` returning an output on all devices of the mesh because of the surrounding `use_mesh` context.

PiperOrigin-RevId: 735909962
2025-03-11 15:25:46 -07:00
Pearu Peterson
82b2591b21 Fix scipy.special.gammainc/gammaincc evaluation at boundary points 2025-03-11 21:18:47 +02:00
jax authors
c2c68c018f Merge pull request #27059 from jakevdp:fix-while-loop
PiperOrigin-RevId: 735828960
2025-03-11 11:32:00 -07:00
Gunhyun Park
d191927b24 Fix syntax error and typos for composite primitive docstring.
PiperOrigin-RevId: 735808000
2025-03-11 10:37:07 -07:00
Jake VanderPlas
4ae3211ea2 jax.disable_jit: ensure while_loop behaves similarly to non-disable_jit version 2025-03-11 09:53:34 -07:00
Praveen Narayanan
b6d4fe5387 Define lax.ragged_dot_general and express lax.ragged_dot in terms of it.
PiperOrigin-RevId: 735471245
2025-03-10 12:25:22 -07:00
jax authors
14b215fe76 Merge pull request #27032 from dfm:lax-dtype
PiperOrigin-RevId: 735424674
2025-03-10 10:18:58 -07:00
Dan Foreman-Mackey
21884d4a14 Move (most) jaxlib linalg custom call registration into JAX.
My motivation here is to fix the plugin support for batch partitionable custom calls. Since plugin support for custom call partitioners is provided via register_plugin_callback in xla_bridge, instead of xla_client itself, it's much more straightforward to register the custom calls in JAX.

It would be possible to refactor things differently, but it actually seems like a reasonable choice to use the supported APIs from `jax.ffi` instead of `xla_client` so that we can take advantage of any new features we might add there in the future.

This is all still a little bit brittle and I'd eventually like to migrate to a version where the XLA FFI library provides a mechanism for exporting handlers, but this change is still compatible with any future changes like that.

PiperOrigin-RevId: 735381736
2025-03-10 08:17:44 -07:00
Dan Foreman-Mackey
4eada56027 Avoid using array operations within lax.py operations. 2025-03-10 11:04:32 -04:00
jax authors
6095af050f Merge pull request #26427 from mattjj:direct-linearize-fixes
PiperOrigin-RevId: 734687601
2025-03-07 14:22:16 -08:00
Matthew Johnson
f4f31f89ae [scan] when num_trips==0, don't generate weird size-zero reshapes 2025-03-07 21:35:40 +00:00
Matthew Johnson
7c2f842353 shard_map and other fixes to direct-linearize
Co-authored-by: Dougal Maclaurin <dougalm@google.com>
2025-03-07 21:02:40 +00:00
Yash Katariya
9f37b5197f [sharding_in_types] Fix a bug where empty_array in scan was created with the wrong spec when unroll > 1.
PiperOrigin-RevId: 734591110
2025-03-07 09:47:32 -08:00
Yash Katariya
f8b98993b8 Add a divisibility check so that we make sure that sharding evenly divides the shape (until this restriction is lifted) to make sure we don't create bad shardings.
Also improve dynamic_update_slice sharding error by printing `aval.str_short()` instead of full sharding because it's concise and gives more info than the current error (i.e. it adds shape too to the error message)

Also make some formatting changes in scan lowering to make it easier to debug.

PiperOrigin-RevId: 734542862
2025-03-07 07:01:34 -08:00
Daniel Suo
e6db7a9d99 Dedup non-ref constants closed in cond branch functions.
PiperOrigin-RevId: 734497907
2025-03-07 04:01:42 -08:00
Yash Katariya
766315f791 Make sure concat + vmap of sharded input and replicated input works properly.
In this case, the example boils down to:

```
inp1 = f32[16@x, 4]
inp2 = f32[4]

def f(x: f32[4], y: f32[4])
  return jnp.concat([x, y], axis=-1)

vmap(f, in_axes=(0, None))(inp1)
```

This example was breaking in concat batching rule because we didn't broadcast with the right sharding.

PiperOrigin-RevId: 733536944
2025-03-04 18:35:13 -08:00
Jake VanderPlas
84ca80d215 doc: in lax.cond, note that both branches will be traced 2025-03-03 13:05:24 -08:00
George Necula
a6c47d6f36 Use the same name for aliased Vars when pretty-printing Jaxprs.
Add a mechanism for using the same Var names for Vars that
are aliased. In this PR, we use this for `pjit`, such that the
following `print(jax.make_jaxpr(lambda a: jax.jit(lambda a: a + 1)(a))(0.))`
prints:

```
{ lambda ; a:f32[]. let
    b:f32[] = pjit[
          name=<lambda>
          jaxpr={ lambda ; a:f32[]. let b:f32[] = add a 1.0 in (b,) }
          ] a
    in (b,) }
```

instead of the previous:

```
{ lambda ; a:f32[]. let
    b:f32[] = pjit[
          name=<lambda>
          jaxpr={ lambda ; c:f32[]. let d:f32[] = add c 1.0 in (d,) }
          ] a
    in (b,) }
```

The same mechanism could be used for other higher-order primitives,
e.g., cond, and others.

Also add some typing declarations and rename APIs to use "shared jaxpr"
in lieu of "top-level jaxpr" for those Jaxprs that are used multiple
times and are printed first. I presume that the term "top-level jaxpr"
was picked because these are printed first at top-level. But this is
confusing, because they are really subjaxprs. In fact, there was already
a function `core.pp_toplevel_jaxpr` for printing the top-level Jaxpr,
and there was also `core.pp_top_level_jaxpr` (which now is named
`core.pp_shared_jaxpr`.
2025-03-03 11:38:51 +01:00
Yash Katariya
53494ade2d PRNGKeyArray.aval should have the correct logical sharding. This required refactoring code so that we don't hit recursion errors.
PiperOrigin-RevId: 732536521
2025-03-01 18:18:19 -08:00
Peter Hawkins
1e5d9a9158 Add an allow_negative_indices option to lax.dynamic_slice and lax.dynamic_update_slice.
The goal of this change is to avoid generating code to wrap negative indices back into range in cases where we know it doesn't matter. Change scan to pass allow_negative_indices=False to avoid emitting index wrapping code for each scan argument.

PiperOrigin-RevId: 731812827
2025-02-27 12:04:28 -08:00
Dan Foreman-Mackey
f93c2a1aa5 Add and test support for partitioning of batch dimensions in lax.linalg.
On CPU and GPU, almost all of the primitives in lax.linalg are backed by custom calls that support simple semantics when batch dimensions are sharded. Before this change, all linalg operations on CPU and GPU will insert an `all-gather` before being executed when called on sharded inputs, even when that shouldn't be necessary. This change adds support for this type of partitioning, to cover a wide range of use cases.

There are a few remaining GPU ops that don't support partitioning either because they are backed by HLO ops that don't partition properly (Cholesky factorization and triangular solves), or because they're still using descriptors with problem dimensions in kernel. I'm going to fix these in follow up changes.

PiperOrigin-RevId: 731732301
2025-02-27 08:16:16 -08:00
Dan Foreman-Mackey
553b441fef Use LAPACK trsm kernel even for batched solves.
Depending on the platform and linked LAPACK library, this change seems to improve (or at least not degrade) performance across a wide range of problem and batch sizes. On colab, the performance is not dramatically improved for most input shapes, but on my Mac, this improves the performance of batched triangular solves by a factor of a few up to an order of magnitude across all the problems that I tried.

PiperOrigin-RevId: 730971127
2025-02-25 11:49:01 -08:00
Dan Foreman-Mackey
2ce88c950a Deprecate alpha argument to trsm LAPACK kernel.
(Part of general cleanups of the lax.linalg submodule.)

This is always set to 1 and I don't see any benefit to keeping this argument around. This can be done in a forward and backward compatible way following these docs: https://docs.jax.dev/en/latest/export/export.html#ensuring-forward-and-backward-compatibility

We start by updating the FFI handler to remove the explicit alpha argument, but allow it to accept (but ignore) extra input arguments. Then we only pass alpha when lowering in forward compatibility mode, or when the jaxlib version is old (I'm using >0.5.1 as the cutoff assuming that this change doesn't make it into the upcoming release).

Then, the forward compatibility lowering can be removed after at least 21 days, and the kernel can be updated at least 180 days after 0.5.2 is released.

PiperOrigin-RevId: 730928808
2025-02-25 10:04:29 -08:00
Dan Foreman-Mackey
62530d5922 Update JVP rule for lax.linalg.lu to use vmap instead of broadcasted_iotas.
PiperOrigin-RevId: 730497540
2025-02-24 10:09:41 -08:00
Dan Foreman-Mackey
6bd99207d5 Fix rank promotion error in JVP of batched eigh.
PiperOrigin-RevId: 730475017
2025-02-24 09:08:55 -08:00
Dan Foreman-Mackey
ae656e1574 Update lax.linalg.svd primitive to use registration helper functions.
PiperOrigin-RevId: 730466560
2025-02-24 08:44:06 -08:00
jax authors
c74f497eaf Merge pull request #25053 from JanLuca:gesvd
PiperOrigin-RevId: 730445233
2025-02-24 07:38:15 -08:00
jax authors
c17ea805f3 Merge pull request #26569 from gnecula:debug_info_arg_names
PiperOrigin-RevId: 730432019
2025-02-24 06:48:41 -08:00
Yash Katariya
7d3c63eded [sharding_in_types] Add more reshape sharding support
* Allow merging and splitting only if major most dim is sharded since that involves no data movement. This only happens if `dimensions` is None i.e. if the input array is in **row-major order**.

  * Merging: If **only** the major most dim is sharded of the merge block then that sharding is propagated to the merge block output

  * Splitting: If the dimension being split is sharded, then the sharding is propagated to the major most dimension post split only if the spec divides the new shape exactly.

PiperOrigin-RevId: 730291595
2025-02-23 21:39:23 -08:00
George Necula
1be801bac8 [better_errors] Cleanup use of DebugInfo.arg_names and result_paths
Previously, we represented a missing arg name with `None`,
and a missing result path with the empty string. We now
adopt the same convention for arg names and use empty strings.
This simplifies the typing, and prevents the string "None" from
appearing in error messages.

I changed how we encode the result paths. Previously for a
function that returns a single array the path was the empty
string (the same as for an unknown path). And for a function
that returns a pair of arrays it was `([0], [1])`. Now we
add the "result" prefix: `("result",)` for a function returning a
single array and `(result[0], result[1])` for a function returning
a pair of arrays.

Finally, in debug_info_test, I removed the `check_tracer_arg_name`
so that all spied tracers are printed with the argument name they
depend on.
2025-02-23 08:27:56 +02:00
Yash Katariya
d695aa4c63 [sharding_in_types] Add sharding rules for the following primitives:
* `bitcast_convert_element_type`
  * `cumsum`
  * `cumlogsumexp`
  * `cumprod`
  * `cummax`
  * `cummin`
  * `reduce_window`
  * `reduce_window_sum`
  * `reduce_window_max`
  * `reduce_window_min`
  * `select_and_gather_add`

For `reduce_window_...` primitives only trivial windowing is supported along non-replicated dimensions. We can relax the other NotImplemented case in the future.

PiperOrigin-RevId: 729910108
2025-02-22 10:45:58 -08:00
Jan Naumann
e03fe3a06d Implement SVD algorithm based on QR for CPU targets
In a recent jax release the SvdAlgorithm parameter has been added
to the jax.lax.linalg.svd function. Currently, for CPU targets
still only the divide and conquer algorithm from LAPACK is
supported (gesdd).

This commits adds the functionality to select the QR based
algorithm on CPU as well. Mainly it addes the wrapper code
to call the gesvd function of LAPACK using the FFI interface.

Signed-off-by: Jan Naumann <j.naumann@fu-berlin.de>
2025-02-22 15:24:57 +01:00
Dan Foreman-Mackey
c4418c1010 Update several remaining lax.linalg primitives to use registration helper functions.
In this change, we update schur, triangular_solve, tridiagonal, and tridiagonal_solve. I batched these ones since they're all pretty straightforward.

PiperOrigin-RevId: 729572705
2025-02-21 10:18:30 -08:00
Dan Foreman-Mackey
ed10003adc Update lax.linalg.qr primitives to use registration helper functions.
PiperOrigin-RevId: 729551997
2025-02-21 09:15:01 -08:00
Dan Foreman-Mackey
09325d925f Update internal unop primitive helper to pass kwargs to dtype rule.
To be consistent with other rule registration helpers, `unop_dtype_rule` should pass through its kwargs to the `result_dtype` callable.

PiperOrigin-RevId: 729483613
2025-02-21 04:52:51 -08:00
Dan Foreman-Mackey
126909b62a Update lax.linalg.lu primitive to use registration helper functions.
PiperOrigin-RevId: 729483456
2025-02-21 04:50:46 -08:00
Dan Foreman-Mackey
a981e1c4b9 Start adding primitive registration helper functions to lax.linalg.
As part of my efforts to simplify the primitive implementations in lax.linalg, I've found that all of the primitives share some common logic when it comes to impls, abstract_evals, and batching. This change adds some helper functions and starts the process of abstracting the primitive definitions to simplify and reduce duplication. I will continue with the rest of the primitives in lax.linalg, but I didn't want to overload the first diff.

PiperOrigin-RevId: 729471970
2025-02-21 04:05:34 -08:00
Robert David
08de0128b6 Fix head comment: was referring to nonexistent parameters.
PiperOrigin-RevId: 729231457
2025-02-20 13:29:40 -08:00
Yash Katariya
8305803b76 [sharding_in_types] Initial support for partial-auto/explicit shard_map + sharding-in-types. If the axes in shmap(..., auto=...) is an explicit axes in the outer mesh context, then that axis is treated as Explicit instead of Auto.
PiperOrigin-RevId: 728920514
2025-02-19 20:04:54 -08:00
jax authors
cb0d326e16 Merge pull request #26591 from jakevdp:lax-docs
PiperOrigin-RevId: 728908919
2025-02-19 19:22:48 -08:00
Roy Frostig
ae10f2da13 fix scan doc on the unroll argument.
Looks like a typo worth fixing.
2025-02-19 11:01:44 -08:00
Yash Katariya
66d04f85e6 Error out if going from Manual -> Auto/Explicit AxisTypes in the auto_axes and explicit_axes API that do mesh_cast implicitly.
Also, improve the error raised by canonicalize_sharding to include the api name and current source location.

PiperOrigin-RevId: 728701237
2025-02-19 09:21:53 -08:00
Yash Katariya
a3edfb43ef Now that sharding_in_types config flag is True, remove the config and all the conditionals
PiperOrigin-RevId: 728653433
2025-02-19 06:53:35 -08:00
Jake VanderPlas
7f115fbb64 jax.lax: improve docs for comparison operators 2025-02-18 13:48:59 -08:00
jax authors
72f0a90ee6 Merge pull request #26401 from jakevdp:numpy-consts
PiperOrigin-RevId: 728292846
2025-02-18 11:32:25 -08:00
Jake VanderPlas
29771dd06c jax.lax: improve docs for bitwise operators. 2025-02-14 14:17:30 -08:00
Jake VanderPlas
33b989ac9e refactor: import numpy objects directly in jax.numpy 2025-02-14 12:47:58 -08:00
Jake VanderPlas
531443c434 jax.lax: improve docs for pow & related functions 2025-02-14 08:40:19 -08:00
George Necula
a0812cd57e [better_errors] Make it explicit that debug_info is not None.
Now all internal uses of lu.wrap_init and core.Jaxpr are with actual
debug info. This enables us to clean up the type declarations and
to remove the checks whether debug_info is present.

For usage outside of the JAX internals, we change
`jax.extend.linear_util.wrap_init` to be usable without debug_info,
for temporary backwards compatibility. We emit a deprecation
warning and fill-in some fake debugging info.

See https://github.com/jax-ml/jax/issues/26480 for more details.

PiperOrigin-RevId: 726770483
2025-02-13 22:07:04 -08:00