26023 Commits

Author SHA1 Message Date
jax authors
ce3412e540 Remove redundant BUILD_TAG from JAX wheels build rule.
PiperOrigin-RevId: 733334423
2025-03-04 08:13:13 -08:00
Adam Paszke
cdae5fcfc7 [Mosaic GPU] Make sure to do the async proxy fence before wargroup sync
This is the ordering we want for a proper release of generic SMEM stores
into the async proxy. The old order was problematic: once the warpgroup
barrier was complete, some warps could get deselected before they get to
the fence. For as long as the first warp would make progress, it could go
through the fence along and start issuing TMA copies before other warps
have synchronized with the async proxy.

I have not observed this problem in any of our kernels so far, but this
order seems safer to me.

PiperOrigin-RevId: 733333814
2025-03-04 08:11:15 -08:00
Sergei Lebedev
155839bb4d [pallas:triton] Emit a better error message for matmul with non-2D operands
Triton seems to support both 2D and 3D operands now, the latter case being a
batched matmul. We need more changes in the lowering to support 3D, so I will
leave it out of scope here.

Fixes #26013.

PiperOrigin-RevId: 733293299
2025-03-04 05:46:29 -08:00
jax authors
8906f281c4 Merge pull request #26883 from dfm:np-unique-sorted
PiperOrigin-RevId: 733287592
2025-03-04 05:19:55 -08:00
Dan Foreman-Mackey
6c5ef1a404 Update jnp.unique to support upstream interface changes. 2025-03-04 05:24:52 -05:00
jax authors
96dce0b605 Update XLA dependency to use revision
2274501a95.

PiperOrigin-RevId: 733242669
2025-03-04 02:17:52 -08:00
Ayaka
ea53c7616b Fix thread safety of JAX error checking
Fix thread safety of JAX error checking by making the global states thread local

PiperOrigin-RevId: 733164878
2025-03-03 20:56:01 -08:00
Sharad Vikram
00d9f4529d [Pallas/Fuser] Add support for custom_call_jvp/pjit to push_block_spec
PiperOrigin-RevId: 733122108
2025-03-03 17:43:13 -08:00
Sharad Vikram
d32e282ff9 Add fuser to jax.experimental.pallas
Note that fuser is considered experimental within Pallas and APIs are subject to change

PiperOrigin-RevId: 733117882
2025-03-03 17:26:44 -08:00
Sharad Vikram
0b6c355083 [Pallas] Add experimental (private for now) API for manual fusion into Pallas kernels
PiperOrigin-RevId: 733112191
2025-03-03 17:05:51 -08:00
jax authors
2c7043f63d Merge pull request #26865 from jakevdp:fix-indexing-error
PiperOrigin-RevId: 733085471
2025-03-03 15:38:20 -08:00
jax authors
f9f47217df Merge pull request #26862 from jakevdp:logsumexp-docs
PiperOrigin-RevId: 733080943
2025-03-03 15:24:10 -08:00
jax authors
4944dcb977 Merge pull request #26897 from jakevdp:cond-doc
PiperOrigin-RevId: 733077065
2025-03-03 15:13:23 -08:00
jax authors
07d1cd0290 Merge pull request #26876 from carlosgmartin:fix_matrix_norm_empty_matrix
PiperOrigin-RevId: 733077011
2025-03-03 15:11:31 -08:00
Yash Katariya
07c4c03a05 Remove the skip for test_output_streaming_inside_scan
PiperOrigin-RevId: 733070842
2025-03-03 14:54:03 -08:00
jax authors
439c412cd4 Merge pull request #26895 from hawkinsp:pad2
PiperOrigin-RevId: 733034756
2025-03-03 13:15:07 -08:00
Jake VanderPlas
84ca80d215 doc: in lax.cond, note that both branches will be traced 2025-03-03 13:05:24 -08:00
Peter Hawkins
7f05b74bca Fix wrong results in multidimensional pad.
When there are multiple dimensions, NumPy's semantics are as if the padding is applied to each dimension in order.

We lacked test coverage for this case because constant values ((0, 2),) and (0, 2) were handled by different code paths.

Fixes https://github.com/jax-ml/jax/issues/26888
2025-03-03 15:25:08 -05:00
carlosgmartin
897e1a1310 Fix linalg.norm to return zero for proper norms of empty matrices. 2025-03-03 15:02:34 -05:00
Tzu-Wei Sung
5179642eb5 [Mosaic] Rename dep name.
PiperOrigin-RevId: 732985217
2025-03-03 11:01:25 -08:00
jax authors
1a57fdf704 Fix convolution example (kernel should be OIHW, not IOHW).
PiperOrigin-RevId: 732952185
2025-03-03 09:32:22 -08:00
Adam Paszke
e9f95cc3a7 [Mosaic GPU] Make the small WGMMA tile independent of transpose flags
Now the small tiling is always `(8, swizzle // bytewidth(dtype))`, no matter whether the input
is transposed or not. This should simply the follow-up refactoring of the code and make it easier
to enable small tiling for LHS too.

PiperOrigin-RevId: 732933005
2025-03-03 08:30:57 -08:00
Bart Chrzaszcz
ed4a7bbab1 #sdy Add JAX backwards compatibility test.
This tests saving a module with one set of axis names, but loading it with another set of axis names.

This does also test the custom calls:

- `@Sharding`
- `@xla.sdy.GlobalToLocalShape`
- `@xla.sdy.LocalToGlobalShape`

But note that there are a bunch of other custom calls that will be tested in the Shardy and XLA codebases. The way the testing utils is tested here doesn't allow me to set `out_shardings` for example. So JAX can rely on the existence of those tests as stability guarantees just like for StableHLO.

PiperOrigin-RevId: 732893432
2025-03-03 06:01:34 -08:00
Bart Chrzaszcz
ac493655bf #sdy support JAX export tests when Shardy is enabled.
This CL only supports lowering a module with the exact same mesh, and loading it with either the exact same mesh or different meshes.

Note that we will be introducing some restrictions under Shardy for JAX export:

- You can only lower/save the module with meshes all of the same shape, but different axis names (this PR is right now only allowing the same axis names, but this will be relaxed in a follow-up)
- When loading the module, just like with GSPMD, you can use a different mesh with a different mesh shape and axis names. However, like with the restriction in the previous point, all shardings must use the same axis shapes, but can use different axis names (again this will be relaxed in a follow-up)

We may remove the restriction of having to use the exact same mesh shapes during export saving time and exact same mesh shaped during export loading time in the future. But for now we will keep this restriction while no one is using Shardy with JAX export.

PiperOrigin-RevId: 732878916
2025-03-03 04:57:06 -08:00
Christos Perivolaropoulos
b9ebd9188f [mgpu] Forach in tiled layout.
PiperOrigin-RevId: 732872906
2025-03-03 04:31:59 -08:00
Adam Paszke
11e6cfbc6a [Mosaic GPU][NFC] Move the calculation of group strides into _validate_mma
This allows us to unify this logic between Hopper and Blackwell.

PiperOrigin-RevId: 732862875
2025-03-03 03:51:20 -08:00
jax authors
bbadf99054 Merge pull request #26697 from gnecula:pp_aliased_var_names
PiperOrigin-RevId: 732860010
2025-03-03 03:36:50 -08:00
Adam Paszke
3038348f23 [Mosaic GPU][NFC] Clean up the computation of group strides
PiperOrigin-RevId: 732849235
2025-03-03 02:50:48 -08:00
George Necula
a6c47d6f36 Use the same name for aliased Vars when pretty-printing Jaxprs.
Add a mechanism for using the same Var names for Vars that
are aliased. In this PR, we use this for `pjit`, such that the
following `print(jax.make_jaxpr(lambda a: jax.jit(lambda a: a + 1)(a))(0.))`
prints:

```
{ lambda ; a:f32[]. let
    b:f32[] = pjit[
          name=<lambda>
          jaxpr={ lambda ; a:f32[]. let b:f32[] = add a 1.0 in (b,) }
          ] a
    in (b,) }
```

instead of the previous:

```
{ lambda ; a:f32[]. let
    b:f32[] = pjit[
          name=<lambda>
          jaxpr={ lambda ; c:f32[]. let d:f32[] = add c 1.0 in (d,) }
          ] a
    in (b,) }
```

The same mechanism could be used for other higher-order primitives,
e.g., cond, and others.

Also add some typing declarations and rename APIs to use "shared jaxpr"
in lieu of "top-level jaxpr" for those Jaxprs that are used multiple
times and are printed first. I presume that the term "top-level jaxpr"
was picked because these are printed first at top-level. But this is
confusing, because they are really subjaxprs. In fact, there was already
a function `core.pp_toplevel_jaxpr` for printing the top-level Jaxpr,
and there was also `core.pp_top_level_jaxpr` (which now is named
`core.pp_shared_jaxpr`.
2025-03-03 11:38:51 +01:00
jax authors
eee4d6019b Update XLA dependency to use revision
77cec94c55.

PiperOrigin-RevId: 732838325
2025-03-03 02:07:04 -08:00
Parker Schuh
b8b690e594 Add use_high_dynamic_range_gumbel flag which allows sampling gumbel such
that it more closely matches the CDF for low probably events (less than
2**-nmant).

Because -log(-log(x)) is more sensitive close to 1 than 0, we must use
-log(-logp1(-x)) instead to make better use of the extra range around 0.

PiperOrigin-RevId: 732757388
2025-03-02 19:42:40 -08:00
Dimitar (Mitko) Asenov
3b305c6617 [Mosaic GPU] Infer layouts (transforms) on memrefs that directly feed into the dialect wgmma op.
This change detects a situation where a gmem_memref is read via `async_load` and directly used in a wgmma. In such cases, we insert a cast before the load to add tile, transpose, and swizzle transformations.

PiperOrigin-RevId: 732618760
2025-03-02 03:17:13 -08:00
Dimitar (Mitko) Asenov
c60ef5a2a1 [Mosaic GPU] Wire up the slice_lengths and indices operands in lowering of the MLIR dialect.
This enables slicing via TMA and is needed for pipelining.

PiperOrigin-RevId: 732613803
2025-03-02 02:43:47 -08:00
jax authors
9d6bcd63b8 Update XLA dependency to use revision
0622372b58.

PiperOrigin-RevId: 732612501
2025-03-02 02:34:44 -08:00
Yash Katariya
53494ade2d PRNGKeyArray.aval should have the correct logical sharding. This required refactoring code so that we don't hit recursion errors.
PiperOrigin-RevId: 732536521
2025-03-01 18:18:19 -08:00
jax authors
e25caba9a4 Update XLA dependency to use revision
92b05a50f7.

PiperOrigin-RevId: 732405310
2025-03-01 02:43:33 -08:00
jax authors
2a1eeb0ce8 Chnages for kernel export
PiperOrigin-RevId: 732383028
2025-03-01 00:32:39 -08:00
Anton Osokin
1f3176636d Reverts 10f6edeb496a2eec2a09c2c5cecbe4f8f02452ab
PiperOrigin-RevId: 732315349
2025-02-28 18:04:27 -08:00
Jake VanderPlas
b2c45b8eb9 Improved errors when indexing with floats 2025-02-28 15:04:07 -08:00
jax authors
48a55a6d71 Add a profiler test for gpu run
PiperOrigin-RevId: 732247572
2025-02-28 13:45:46 -08:00
Jake VanderPlas
c56e794a66 doc: fix description of logsumexp axis 2025-02-28 12:53:33 -08:00
Dan Foreman-Mackey
70024d2201 Temporarily skip some more linalg sharding checks.
PiperOrigin-RevId: 732222043
2025-02-28 12:26:18 -08:00
Kanglan Tang
da7c90c4c4 Bump oldest supported libtpu to match the compatibility window (12 weeks)
PiperOrigin-RevId: 732218061
2025-02-28 12:13:42 -08:00
Nitin Srinivasan
0ed42dcdd0 Print the list of installed packages before running pytests
Also, do not upgrade packages and disable editable mode when installing JAX at head

PiperOrigin-RevId: 732208266
2025-02-28 11:46:12 -08:00
Yash Katariya
da1cc0a50e [sharding_in_types] out_sharding argument on einsum should only apply to the last einsum and not intermediate einsums.
For example: Consider this einsum: `jnp.einsum('bthD, bthi, bthj->ijD', dy, i, j, out_sharding=P('data', None, None))`

This will decompose into 2 einsums where the intermediate einsum output will be of rank `5`:
  * `'bthj,bthD->bthjD'`
  * `'bthjD,bthi->ijD'`

The out_sharding specified (`P('data', None, None)`) is not compatible with the intermediate einsum: `'bthj,bthD->bthjD'` since the `length of spec (3) != out_aval.ndim (5)`.

This change makes it so that out_sharding is only applied to the contraction that leads to the final output. **If there are conflicts in intermediate einsums, then the user has to reshard the input or split into multiple einsums (and maybe provide out_sharding) so that conflicts don't exist.**

Note: We won't drop into auto mode for intermediate einsums. The user will have to split the einsum if any conflict is detected.
PiperOrigin-RevId: 732205849
2025-02-28 11:39:14 -08:00
jax authors
10f6edeb49 Merge pull request #26814 from dfm:remat-opt-custom-dce
PiperOrigin-RevId: 732190536
2025-02-28 10:51:54 -08:00
jax authors
6ee261e908 Merge pull request #26835 from skye:libtpu_version
PiperOrigin-RevId: 732165617
2025-02-28 09:35:53 -08:00
jax authors
8f57b8167b Add build targets for jax-rocm-plugin and jax-rocm-pjrt wheels.
PiperOrigin-RevId: 732149495
2025-02-28 08:36:46 -08:00
Dan Foreman-Mackey
bb9aed5eec Reimplement custom_vjp.optimize_remat using custom_dce. 2025-02-28 10:00:28 -05:00
Adam Paszke
bb96226dd8 [Mosaic GPU] Add support for small RHS tile sizes in WGMMA
This is useful for more fine-grained autotuning and can help avoid
wave quantization effects.

PiperOrigin-RevId: 732105219
2025-02-28 05:41:30 -08:00