16367 Commits

Author SHA1 Message Date
Gunhyun Park
d191927b24 Fix syntax error and typos for composite primitive docstring.
PiperOrigin-RevId: 735808000
2025-03-11 10:37:07 -07:00
Jake VanderPlas
4ae3211ea2 jax.disable_jit: ensure while_loop behaves similarly to non-disable_jit version 2025-03-11 09:53:34 -07:00
Adam Paszke
30a9e1b3bf [Mosaic GPU] Add support for .cta_group::2 MMA with n=512 on Blackwell
This one is particularly annoying, because we have to break up the MMA
into two collective N=256 MMAs. However, TensorCore only updates a contiguous
chunk of columns in TMEM and so after executing two of those we end up with
a TMEM layout that looks like this:

```
Contributing CTA |    0    |    1    |    0    |    1    |
N local          |   0:128 |   0:128 | 128:256 | 128:256 |
N                |   0:128 | 256:384 | 128:256 | 384:512 |
```

You can see that the TMEM columns no longer monotonically go over all
columns until N=512, but they include a number of jumps!

We could fix this on the load side, by ensuring that each CTA in the group
does a strided load along the tiled dimension, but that just seems more
trouble than it's worth (and is not that well supported by TMA unless we
increase the number of striding levels).

Instead, we encode this weirdness in the TMEM layout we use and make sure
to rearrange the data properly while loading the tiles into registers.

PiperOrigin-RevId: 735791426
2025-03-11 09:53:20 -07:00
jax authors
1aca76fc13 Update :build_jaxlib flag to control whether we should add py_import dependencies to the test targets.
This change enables testing the wheels produced by the build rules in the presubmit using one `bazel test` command only.

There are three options for running the tests:

1) `build_jaxlib=true`: the tests depend on JAX targets.
2) `build_jaxlib=false`: the tests depend on the wheel files located in the `dist` folder.
3) `build_jaxlib=wheel`: the tests depend on the py_import targets.

PiperOrigin-RevId: 735765819
2025-03-11 08:31:43 -07:00
Yash Katariya
76dec38286 Under pjit the with mesh: context will use use_mesh(mesh): jit instead of tracking separately using resource_env.
This would also make it easier to deprecate the `with mesh: pjit` path in the future from user code since the new path would be completely tested.
This will also allow us to remove `resource_env` from JAX and the internal API access of `resource_env.physical_mesh` spread throughout codebases internally and externally.

PiperOrigin-RevId: 735602187
2025-03-10 20:21:02 -07:00
jax authors
02505fa757 [Pallas TPU] Remove next_slot SMEM tensor from pipeline emitter
PiperOrigin-RevId: 735564365
2025-03-10 17:19:39 -07:00
Ayaka
988a1208a9 Better error message when raise_if_error() is called within a traced context
PiperOrigin-RevId: 735557928
2025-03-10 16:55:06 -07:00
jax authors
aceae84fab [Pallas] Enable skipping of floating-point operations when interpreting Pallas TPU kernels on CPU.
PiperOrigin-RevId: 735527650
2025-03-10 15:14:00 -07:00
Sharad Vikram
81dde225b0 [Pallas/Fuser] Add select_n push rule
PiperOrigin-RevId: 735510713
2025-03-10 14:23:01 -07:00
jax authors
261e6e5fdc Merge pull request #27038 from jakevdp:vmap-sentinel
PiperOrigin-RevId: 735510065
2025-03-10 14:21:11 -07:00
jax authors
c942b0fef0 Merge pull request #26977 from jakevdp:fix-expn
PiperOrigin-RevId: 735506133
2025-03-10 14:09:32 -07:00
Sharad Vikram
87272fbe93 [Pallas/Fuser] Add debug option to fuser.fuse that prints out jaxpr
PiperOrigin-RevId: 735505460
2025-03-10 14:07:26 -07:00
carlosgmartin
8b6ca56417 Fix the ValueError message for random.binomial (forgot to use string formatting). 2025-03-10 16:38:03 -04:00
jax authors
affe2e734e Rename dot_with_no_batch_dims_saveable to dots_with_no_batch_dims_saveable for internal consistency
PiperOrigin-RevId: 735484326
2025-03-10 13:04:49 -07:00
Praveen Narayanan
b6d4fe5387 Define lax.ragged_dot_general and express lax.ragged_dot in terms of it.
PiperOrigin-RevId: 735471245
2025-03-10 12:25:22 -07:00
jax authors
18f2f19c1a Merge pull request #26525 from wenscarl:e2m1fn
PiperOrigin-RevId: 735457804
2025-03-10 11:46:18 -07:00
Jacob Burnim
73d20cd62a [Pallas] Small fix to TPU interpret mode (input_output_aliases + scalar args).
PiperOrigin-RevId: 735455671
2025-03-10 11:40:10 -07:00
Jake VanderPlas
8ecadfdf9d Internal: make it easier to detect the vmap sentinel 2025-03-10 11:37:50 -07:00
Michael Whittaker
5cb29949d4 Warn the user if transparent huge pages aren't enabled.
PiperOrigin-RevId: 735431881
2025-03-10 10:37:58 -07:00
jax authors
14b215fe76 Merge pull request #27032 from dfm:lax-dtype
PiperOrigin-RevId: 735424674
2025-03-10 10:18:58 -07:00
jax authors
ab0ce8a448 Merge pull request #26811 from dfm:direct-lin
PiperOrigin-RevId: 735388827
2025-03-10 08:39:49 -07:00
Dimitar (Mitko) Asenov
d2bf034c47 [Mosaic GPU] Test the wgmma_op lowering when a is in registers.
I had to add support for wgmma layout in vector_load. Not sure if this is useful outside the test.

PiperOrigin-RevId: 735384104
2025-03-10 08:25:43 -07:00
Dan Foreman-Mackey
21884d4a14 Move (most) jaxlib linalg custom call registration into JAX.
My motivation here is to fix the plugin support for batch partitionable custom calls. Since plugin support for custom call partitioners is provided via register_plugin_callback in xla_bridge, instead of xla_client itself, it's much more straightforward to register the custom calls in JAX.

It would be possible to refactor things differently, but it actually seems like a reasonable choice to use the supported APIs from `jax.ffi` instead of `xla_client` so that we can take advantage of any new features we might add there in the future.

This is all still a little bit brittle and I'd eventually like to migrate to a version where the XLA FFI library provides a mechanism for exporting handlers, but this change is still compatible with any future changes like that.

PiperOrigin-RevId: 735381736
2025-03-10 08:17:44 -07:00
Dan Foreman-Mackey
4eada56027 Avoid using array operations within lax.py operations. 2025-03-10 11:04:32 -04:00
Sergei Lebedev
91340ea0a7 [pallas:mosaic_gpu] Added support for math functions to the WG lowering
PiperOrigin-RevId: 735333893
2025-03-10 05:08:19 -07:00
Benjamin Chetioui
75d8702023 [Pallas/Mosaic GPU] Add lowerings/layout inference for all the necessary conversion ops when using Warpgroup semantics.
Enable some of the pre-existing Pallas `ops_test`s for testing.

PiperOrigin-RevId: 735293084
2025-03-10 02:14:39 -07:00
Dan Foreman-Mackey
36d515ed2c A few more fixes for debug_info tests with direct_linearize. 2025-03-08 07:47:24 -05:00
Jevin Jiang
0f0636afab [Mosaic TPU][Pallas] Add pl.reciprocal
PiperOrigin-RevId: 734749577
2025-03-07 18:29:30 -08:00
jax authors
4988adccf1 Merge pull request #27010 from mattjj:direct-linearize-fixes-3
PiperOrigin-RevId: 734747001
2025-03-07 18:15:02 -08:00
Matthew Johnson
fe26c19b92 [direct-linearize] fix name_stack bugs
Surprisingly, the bug was tracked down to #26111 aka cl/730939406, specifically
the new implementation of reset_name_stack in source_info_util.py.

To repro, use the before-this-commit implementation of reset_name_stack (left
commented-out in the file), and run

```
  JAX_USE_DIRECT_LINEARIZE=1 python tests/name_stack_test.py NameStackTransformationTest.test_nested_jit_stack
```
2025-03-08 01:51:19 +00:00
Matthew Johnson
251b93ebd7 fixups that we meant to include in #26427
Co-authored-by: Dougal Maclaurin <dougalm@google.com>
2025-03-08 00:03:26 +00:00
Jevin Jiang
041f575747 Support MHA in ragged paged attention for packed type
PiperOrigin-RevId: 734695213
2025-03-07 14:47:04 -08:00
jax authors
6095af050f Merge pull request #26427 from mattjj:direct-linearize-fixes
PiperOrigin-RevId: 734687601
2025-03-07 14:22:16 -08:00
jax authors
d849779689 Merge pull request #27001 from mattjj:yash-scan
PiperOrigin-RevId: 734685031
2025-03-07 14:14:30 -08:00
jax authors
1870176eb3 Merge pull request #26979 from mattjj:26936
PiperOrigin-RevId: 734674945
2025-03-07 13:43:55 -08:00
Matthew Johnson
f4f31f89ae [scan] when num_trips==0, don't generate weird size-zero reshapes 2025-03-07 21:35:40 +00:00
Matthew Johnson
7c2f842353 shard_map and other fixes to direct-linearize
Co-authored-by: Dougal Maclaurin <dougalm@google.com>
2025-03-07 21:02:40 +00:00
Matthew Johnson
0e30a3ace9 [mutable-arrays] read values should have the same explicit sharding as ref
fixes #26936
2025-03-07 20:53:29 +00:00
jax authors
ccf7278292 Add the len(arg) to the error message for static_argnums
Helps reduce the confusion on what is considered an argnum.
Ideally there should be static_argkwg

PiperOrigin-RevId: 734591856
2025-03-07 09:49:49 -08:00
Yash Katariya
9f37b5197f [sharding_in_types] Fix a bug where empty_array in scan was created with the wrong spec when unroll > 1.
PiperOrigin-RevId: 734591110
2025-03-07 09:47:32 -08:00
Christos Perivolaropoulos
eeccc67c0b [mgpu] Debug print arrays.
PiperOrigin-RevId: 734576543
2025-03-07 08:58:25 -08:00
Adam Paszke
1bef8b61af [Mosaic GPU] Add a better explanation for the transposed layout
Thanks to @bchetioui for the discussion!

PiperOrigin-RevId: 734564672
2025-03-07 08:19:32 -08:00
Sergei Lebedev
928caf83ee [pallas:mosaic_gpu] copy_smem_to_gmem now allows skipping cp.async.commit_group
This feature is necessary to fix the SMEM->GMEM waiting behavior in
`emit_pipeline`, which used a pessimistic condition prior to this change,
since every copy was its own commit group.

PiperOrigin-RevId: 734553668
2025-03-07 07:43:54 -08:00
Adam Paszke
65462fe684 [Mosaic GPU] Add a new layout to help with transposing WGMMA results
PiperOrigin-RevId: 734553651
2025-03-07 07:42:01 -08:00
Yash Katariya
f8b98993b8 Add a divisibility check so that we make sure that sharding evenly divides the shape (until this restriction is lifted) to make sure we don't create bad shardings.
Also improve dynamic_update_slice sharding error by printing `aval.str_short()` instead of full sharding because it's concise and gives more info than the current error (i.e. it adds shape too to the error message)

Also make some formatting changes in scan lowering to make it easier to debug.

PiperOrigin-RevId: 734542862
2025-03-07 07:01:34 -08:00
Dan Foreman-Mackey
b7ecfdfd95 Update ad.backward_pass to support non-linear functions of constants. 2025-03-07 09:54:06 -05:00
Adam Paszke
85c6b6a128 [Mosaic GPU] Add support for tiling stores to refs using small tiling
The difficulty here is that our register tiling is based on the (64, 8)
shape, while the memory tiling is now (8, swizzle // bytewidth). Before,
we would assume that each register tile fits neatly within a single
memory tile, but now it is obviously not the case. Luckily, it wasn't
too hard to add.

PiperOrigin-RevId: 734517000
2025-03-07 05:19:11 -08:00
jax authors
de78d2cc71 Merge pull request #26950 from lockwo:Owen/add-pmap-typehint
PiperOrigin-RevId: 734500798
2025-03-07 04:10:35 -08:00
Daniel Suo
e6db7a9d99 Dedup non-ref constants closed in cond branch functions.
PiperOrigin-RevId: 734497907
2025-03-07 04:01:42 -08:00
shuw
ccbe9f7cd6 Fix lint 2025-03-07 04:52:58 +00:00