23681 Commits

Author SHA1 Message Date
Justin Fu
0b46a236c1 Update Pallas distributed tutorials with jax.make_mesh 2024-10-21 12:49:56 -07:00
jax authors
16fca386a3 Update XLA dependency to use revision
76da730179.

PiperOrigin-RevId: 688222632
2024-10-21 12:03:12 -07:00
jax authors
4a5ca2fd00 Merge pull request #24400 from jakevdp:subtract-ufunc
PiperOrigin-RevId: 688190106
2024-10-21 10:38:52 -07:00
jax authors
65307abd81 Merge pull request #24370 from dfm:ffi-call-to-callable
PiperOrigin-RevId: 688188390
2024-10-21 10:34:56 -07:00
Jake VanderPlas
6467d03925 Make jnp.subtract a ufunc 2024-10-21 10:11:51 -07:00
Ezekiel Calubaquib
ad53addb74 Move out mnist py/Jax tensorflow lite tests to tensorflow lite repo
PiperOrigin-RevId: 688178268
2024-10-21 10:08:21 -07:00
jax authors
e29b93ff3e Merge pull request #24421 from jakevdp:cross-doc
PiperOrigin-RevId: 688175417
2024-10-21 10:01:45 -07:00
Dan Foreman-Mackey
0b651f0f45 Make ffi_call return a callable 2024-10-21 12:16:57 -04:00
jax authors
fe83d888b9 Merge pull request #24417 from rajasekharporeddy:testbranch1
PiperOrigin-RevId: 688159150
2024-10-21 09:10:45 -07:00
rajasekharporeddy
02f65bb11a Update warning message for jit of pmap 2024-10-21 21:17:59 +05:30
Yash Katariya
783285a71c FIx jax2tf breakge of iota
PiperOrigin-RevId: 688146581
2024-10-21 08:30:49 -07:00
Adam Paszke
f833891c87 [Pallas:MGPU] Add support for passing in WGMMA lhs from registers
PiperOrigin-RevId: 688117316
2024-10-21 06:42:18 -07:00
Adam Paszke
f08801b8d6 [Pallas:MGPU] Allow indexing to appear anywhere in the list of transforms
We only need to exchange the transforms preceding the indexer, while
the rest can remain unmodified.

PiperOrigin-RevId: 688112088
2024-10-21 06:22:16 -07:00
Jake VanderPlas
a1140e9246 Better docs for jnp.cross 2024-10-21 05:59:22 -07:00
Nitin Srinivasan
a2bc8c2e07 Remove temporary aliases from .bazelrc
These aliases were added to not break existing presubmit builds. Now that the presubmit builds have been updated, these aliases can be removed.

Also, corrects some comments.

PiperOrigin-RevId: 688096364
2024-10-21 05:20:13 -07:00
jax authors
f4b84e1c97 Merge pull request #24342 from gnecula:export_custom_types
PiperOrigin-RevId: 688093192
2024-10-21 05:08:04 -07:00
George Necula
2feea414ac [export] Add support for serialization for some custom PyTree nodes
See the added documentation for `jax._src.export.register_pytree_node_serialization`
and `jax._src.export.register_namedtuple_serialization`.

Serialization of PyTree nodes is needed to serialize the `in_tree` and
`out_tree` fields of `Exported` functions (not to serialize actual instances
of the custom types).

When writing this I have looked at how TensorFlow handles namedtuple. It does
so transparently, without requiring the user to register a serialization
handler for the namedtuple type. But this has the disadvantage that on
deserializaton a fresh distinct namedtuple type is created for
each input and output type of the serialized function. This means that
calling the deserialized function will return outputs of different types
than then function that was serialized. This can be confusing.

The Python pickle mode does a bit better: it attempts to look up the
namedtuple type as a module attribute in the deserializing code,
importing automatically the module whose name was saved during serialization.
This is too much magic for my taste, as it can result in strange import errors.

Hence I added an explicit step for the user to say how they want
the namedtuple to be serialized and deserialized.

Since I wanted to also add support for `collections.OrderedDict`, which
users are asking for, I added more general support for PyTree custom nodes.
Note that this registration mechanism works in conjunction with the
PyTree custom node registration mechanism. The burden is on the
user to decide how to serialize and deserialize the custom auxdata that
the PyTree custom registration mechanism uses. Not all custom types
will be serializable, but many commonly used ones, e.g., dataclasses,
can now be inputs and outputs of the serialized functions.
2024-10-21 11:38:13 +02:00
jax authors
0d7ef9c9ca Merge pull request #24403 from jakevdp:load-doc
PiperOrigin-RevId: 688048891
2024-10-21 02:19:32 -07:00
jax authors
33a73852eb Update XLA dependency to use revision
8a7920d699.

PiperOrigin-RevId: 687898456
2024-10-20 13:03:47 -07:00
Yash Katariya
ca2d1584f8 Remove mesh_utils.create_device_mesh from docs
PiperOrigin-RevId: 687695419
2024-10-19 15:48:42 -07:00
jax authors
77fb1eee11 Update XLA dependency to use revision
d0d716fb63.

PiperOrigin-RevId: 687675747
2024-10-19 13:38:07 -07:00
jax authors
48bddc6f6c Adds arith.select to the op patters in order to canonicalize non 32 bit selects.
PiperOrigin-RevId: 687635492
2024-10-19 09:09:06 -07:00
Jake VanderPlas
0a85ba5f82 Better documentation for jnp.load 2024-10-19 06:20:20 -07:00
Ayaka
884f1dc3a1 [Pallas TPU] Use new MLIR op names
PiperOrigin-RevId: 687454709
2024-10-18 16:14:27 -07:00
jax authors
22426519b7 Update XLA dependency to use revision
7e3b0097bd.

PiperOrigin-RevId: 687427622
2024-10-18 14:35:02 -07:00
Adam Paszke
bbcc3eef3c [Pallas:MGPU] Fix the implementation of WGMMA with transposed RHS
It's not enough that we have the physical transpose between the order
of tiled dimensions, we also need the user to explicitly transpose the
logical dimensions. This fixes a shape error that was previously hidden
because the RHS was square.

PiperOrigin-RevId: 687350270
2024-10-18 10:31:42 -07:00
Yash Katariya
2153de4ce0 [sharding_in_types] If out_aval.sharding is not None and the user specified out_sharding is None, concretize it with the device assignment available and add it to the final out_shardings that's used for lowering and compilation.
This will allow us to return the exact sharding spec that sharding propagation rules figured out.

PiperOrigin-RevId: 687349015
2024-10-18 10:27:58 -07:00
Christos Perivolaropoulos
f8a3c0366b [pallas] run_scoped now supports partial discharge.
PiperOrigin-RevId: 687347284
2024-10-18 10:22:31 -07:00
Benjamin Chetioui
ade480ff05 Add a dialect for Mosaic GPU.
PiperOrigin-RevId: 687325692
2024-10-18 09:11:31 -07:00
jax authors
eba5748094 Disable breaking test-case
PiperOrigin-RevId: 687320199
2024-10-18 08:54:36 -07:00
Adam Paszke
e138e8e49d [Pallas:MGPU] Fix docstring for commit_shared
PiperOrigin-RevId: 687308732
2024-10-18 08:16:55 -07:00
Tom Hennigan
86155561fb nit: Use frozen dataclasses rather than unsafe_hash.
PiperOrigin-RevId: 687267707
2024-10-18 05:35:54 -07:00
Adam Paszke
4094564815 [Pallas:MGPU] Force alignment of SMEM allocations to 1024 bytes
This is to avoid issues when small buffers throw off the alignment for large TMA and WGMMA
operands. We should make this more refined in the future, but this should be enough for now.

PiperOrigin-RevId: 687264994
2024-10-18 05:21:53 -07:00
Adam Paszke
0ee9531ef2 [Pallas:MGPU] Add support for indexed refs to WGMMA
PiperOrigin-RevId: 687258992
2024-10-18 04:55:34 -07:00
Adam Paszke
f2edc83af3 [Pallas:MGPU] Properly commute indexing with other transforms
Doing so requires us to modify the other transforms when we attempt to
move indexing before them.

PiperOrigin-RevId: 687240515
2024-10-18 03:39:51 -07:00
Yash Katariya
4db212d2c6 Add _sharding argument to broadcasted_iota as a private parameter which only works under sharding_in_types mode.
This is required because `jax.nn.one_hot` calls into `broascasted_iota`.

PiperOrigin-RevId: 687152343
2024-10-17 21:16:51 -07:00
jax authors
dd5426301a Allow simple host call that uses host tensor as parameter/result in
linear layout. This cl only handles very simple host call patterns.
A more thorough implementation of propagation of T(1)S(5) will be done
later.

This cl doesn't handle host call that passes/returns tensors that
live on device with linear layout either, which will also be impelmented
separately.

PiperOrigin-RevId: 687113203
2024-10-17 18:22:46 -07:00
Dan Foreman-Mackey
8361eb58e1 Activate the FFI implementation of SVD on GPU.
Alongside activating this new implementation, this change adds a new `algorithm` parameter to `jax.lax.svd`. Previously the choice of algorithm was made based on heuristics in the lowering rule, but it probably also makes sense to expose an option for users to specify the algorithm explicitly because our heuristics are not very carefully optimized.

This change updates the implementation of SVD in `lax` to use the FFI version which was added to jaxlib in https://github.com/jax-ml/jax/pull/23794. This comes with a few benefits:

1. When running on a CUDA platform, the 64-bit API will be used for the algorithm based on QR decomposition. (Note that it looks like the 64-bit API isn't available on ROCm.) This addresses part of the feature request in https://github.com/jax-ml/jax/issues/23413, although there's still work to do to port the rest of the GPU calls to the 64-bit API.

2. This implementation supports shape polymorphism in all dimensions with some caveats. By default, we do use some heuristics to based on the matrix sizes to select the algorithm that is used, and the three different algorithms (QR, Jacobi, and batched Jacobi) have sufficiently different behavior (QR returns V^H, whereas Jacobi returns V; batched Jacobi doesn't support `full_matrices=False`) that I couldn't work out a simple way to push this logic into the kernel. If the symbolic constraints are not sufficient to concretely determine the heuristics, we always use the QR algorithm. But, I've also exposed the algorithm selection in the user API, so it's possible to bypass the heuristics and get consistent behavior alongside shape polymorphism if needed.

Besides these core changes, I removed the forward compatibility checks from the CPU lowering, since we're well outside of the forward compatibility window now.

PiperOrigin-RevId: 687106965
2024-10-17 17:57:06 -07:00
Yash Katariya
3e634d9530 [sharding_in_types] Add lax.transpose sharding propagation rule
PiperOrigin-RevId: 687094297
2024-10-17 17:08:04 -07:00
Yash Katariya
57a95a77ff [sharding_in_types] Support jnp.array with sharding_in_types. When the input array has a sharding, propagate it through without dropping the sharding.
PiperOrigin-RevId: 687089357
2024-10-17 16:51:41 -07:00
Yash Katariya
5df4878ad0 [sharding_in_types] Add reduce max, integer_pow and standard_unop sharding rules
PiperOrigin-RevId: 687073144
2024-10-17 15:55:29 -07:00
Yash Katariya
e92e1191b3 [sharding_in_types] Add broadcast_in_dim rule.
PiperOrigin-RevId: 687054181
2024-10-17 14:55:10 -07:00
jax authors
93389ab5f4 Update XLA dependency to use revision
70df652679.

PiperOrigin-RevId: 687045334
2024-10-17 14:29:44 -07:00
jax authors
919f7c8684 Merge pull request #24345 from phu0ngng:cuda_custom_call
PiperOrigin-RevId: 687034466
2024-10-17 13:57:15 -07:00
Adam Paszke
2d78b17226 [Pallas:MGPU] Add support for transforms in user-specified async copies
PiperOrigin-RevId: 687019020
2024-10-17 13:10:45 -07:00
jax authors
6c2649fdf2 Rewrite mosaic concat to support operand shapes that do not align with native shapes, Expand tests to cover multi operand, batch dim concat, etc.
PiperOrigin-RevId: 687003778
2024-10-17 12:24:51 -07:00
Ionel Gog
ec279f9c54 Add config option to log or fatal when jax.Arrays are GCed.
Introduces `jax.config.array_garbage_collection_guard`, which is a tristate config for setting up a `jax.Array` garbage collection guard. The possible configs are:
* allow: `jax.Array`s are allowed to be garbage collected. This is the default value.
* log: whenever a `jax.Array` is GCed a log entry is generated with the array's traceback.
* fatal: fatal crash when a `jax.Array` is GCed. This is meant to be used for mature code bases that do tight memory management, and are reference cycle free.

PiperOrigin-RevId: 687003464
2024-10-17 12:23:16 -07:00
jax authors
1b5cf5a494 Fix breaking test-case
PiperOrigin-RevId: 686932281
2024-10-17 08:57:15 -07:00
Sergei Lebedev
de7beb91a7 [pallas:mosaic_gpu] Added layout_cast
PiperOrigin-RevId: 686917796
2024-10-17 08:08:05 -07:00
Adam Paszke
0519db15ab [Pallas:MGPU] Add lowerings for more ops
PiperOrigin-RevId: 686910947
2024-10-17 07:42:56 -07:00