7487 Commits

Author SHA1 Message Date
jax authors
1c6b0a9193 Merge pull request #24465 from jakevdp:fix-mypy
PiperOrigin-RevId: 688632024
2024-10-22 11:45:27 -07:00
jax authors
9a2dd19a92 Merge pull request #21524 from andportnoy:aportnoy/unknown-platform-lowering-warning
PiperOrigin-RevId: 688630259
2024-10-22 11:40:39 -07:00
jax authors
1e41d5ef6f Merge pull request #24452 from jakevdp:insert-doc
PiperOrigin-RevId: 688624762
2024-10-22 11:26:38 -07:00
Jake VanderPlas
849850216d fix mypy error 2024-10-22 11:10:10 -07:00
Ayaka
c60bafcc33 [Pallas TPU] Fix lowering for jnp.remainder
Fixes https://github.com/jax-ml/jax/issues/24027

PiperOrigin-RevId: 688614799
2024-10-22 11:01:58 -07:00
Andrey Portnoy
2aaa108f06 Raise an error when registering a lowering for an unknown platform 2024-10-22 13:29:48 -04:00
Jake VanderPlas
48dd153e18 Better docs for jnp.insert 2024-10-22 09:20:48 -07:00
Jake VanderPlas
7e38cbd604 Better docs for jnp.fromfunction 2024-10-22 08:42:22 -07:00
jax authors
587832f295 Merge pull request #24442 from jakevdp:lexsort-doc
PiperOrigin-RevId: 688563766
2024-10-22 08:40:21 -07:00
jax authors
a2e4aff897 Merge pull request #24425 from dfm:rename-vmap-methods
PiperOrigin-RevId: 688547393
2024-10-22 07:51:29 -07:00
Christos Perivolaropoulos
4f9356361a [pallas] Support for setting explicit backends to pallas_call.
PiperOrigin-RevId: 688511303
2024-10-22 05:37:15 -07:00
Adam Paszke
2db03ba54b [Pallas:MGPU] Add support for grid dims in GPUMesh
Of course no communication can happen across grid dimensions (unlike over the WG dim),
but we need to be able to launch multiple blocks somehow.

PiperOrigin-RevId: 688488660
2024-10-22 04:10:46 -07:00
jax authors
0b3f0e11fb Reverts ebb75db8a523150c48376d15391f84380a2bb110
PiperOrigin-RevId: 688477769
2024-10-22 03:29:32 -07:00
Adam Paszke
84a303f32f [Pallas:MGPU] Allow allocating transformed refs in run_scoped
PiperOrigin-RevId: 688448592
2024-10-22 01:38:46 -07:00
Yash Katariya
ebb75db8a5 [sharding_in_types] Add out_type argument to einsum and dot_general to allow specifying for the output type. Right now, it only accept a NamedSharding but in the future we can allow a polymorphic type of: jax.ShapeDtypeStruct | Sharding | Layout.
PiperOrigin-RevId: 688399552
2024-10-21 22:23:53 -07:00
Jake VanderPlas
8800fe2870 Better documentation for jnp.lexsort 2024-10-21 16:33:14 -07:00
jax authors
441aeebb29 Merge pull request #24420 from superbobry:maint-2
PiperOrigin-RevId: 688271404
2024-10-21 14:22:43 -07:00
Sergei Lebedev
3ad1985e1a Bumped mypy and ruff versions used by pre-commit 2024-10-21 21:58:41 +01:00
Jake VanderPlas
66971a2869 Fix jnp.diff for boolean inputs 2024-10-21 13:35:13 -07:00
Dan Foreman-Mackey
61701af4a2 Rename vmap methods for callbacks. 2024-10-21 15:03:04 -04:00
jax authors
4a5ca2fd00 Merge pull request #24400 from jakevdp:subtract-ufunc
PiperOrigin-RevId: 688190106
2024-10-21 10:38:52 -07:00
jax authors
65307abd81 Merge pull request #24370 from dfm:ffi-call-to-callable
PiperOrigin-RevId: 688188390
2024-10-21 10:34:56 -07:00
Jake VanderPlas
6467d03925 Make jnp.subtract a ufunc 2024-10-21 10:11:51 -07:00
jax authors
e29b93ff3e Merge pull request #24421 from jakevdp:cross-doc
PiperOrigin-RevId: 688175417
2024-10-21 10:01:45 -07:00
Dan Foreman-Mackey
0b651f0f45 Make ffi_call return a callable 2024-10-21 12:16:57 -04:00
rajasekharporeddy
02f65bb11a Update warning message for jit of pmap 2024-10-21 21:17:59 +05:30
Adam Paszke
f833891c87 [Pallas:MGPU] Add support for passing in WGMMA lhs from registers
PiperOrigin-RevId: 688117316
2024-10-21 06:42:18 -07:00
Adam Paszke
f08801b8d6 [Pallas:MGPU] Allow indexing to appear anywhere in the list of transforms
We only need to exchange the transforms preceding the indexer, while
the rest can remain unmodified.

PiperOrigin-RevId: 688112088
2024-10-21 06:22:16 -07:00
Jake VanderPlas
a1140e9246 Better docs for jnp.cross 2024-10-21 05:59:22 -07:00
jax authors
f4b84e1c97 Merge pull request #24342 from gnecula:export_custom_types
PiperOrigin-RevId: 688093192
2024-10-21 05:08:04 -07:00
George Necula
2feea414ac [export] Add support for serialization for some custom PyTree nodes
See the added documentation for `jax._src.export.register_pytree_node_serialization`
and `jax._src.export.register_namedtuple_serialization`.

Serialization of PyTree nodes is needed to serialize the `in_tree` and
`out_tree` fields of `Exported` functions (not to serialize actual instances
of the custom types).

When writing this I have looked at how TensorFlow handles namedtuple. It does
so transparently, without requiring the user to register a serialization
handler for the namedtuple type. But this has the disadvantage that on
deserializaton a fresh distinct namedtuple type is created for
each input and output type of the serialized function. This means that
calling the deserialized function will return outputs of different types
than then function that was serialized. This can be confusing.

The Python pickle mode does a bit better: it attempts to look up the
namedtuple type as a module attribute in the deserializing code,
importing automatically the module whose name was saved during serialization.
This is too much magic for my taste, as it can result in strange import errors.

Hence I added an explicit step for the user to say how they want
the namedtuple to be serialized and deserialized.

Since I wanted to also add support for `collections.OrderedDict`, which
users are asking for, I added more general support for PyTree custom nodes.
Note that this registration mechanism works in conjunction with the
PyTree custom node registration mechanism. The burden is on the
user to decide how to serialize and deserialize the custom auxdata that
the PyTree custom registration mechanism uses. Not all custom types
will be serializable, but many commonly used ones, e.g., dataclasses,
can now be inputs and outputs of the serialized functions.
2024-10-21 11:38:13 +02:00
jax authors
0d7ef9c9ca Merge pull request #24403 from jakevdp:load-doc
PiperOrigin-RevId: 688048891
2024-10-21 02:19:32 -07:00
Yash Katariya
ca2d1584f8 Remove mesh_utils.create_device_mesh from docs
PiperOrigin-RevId: 687695419
2024-10-19 15:48:42 -07:00
Jake VanderPlas
0a85ba5f82 Better documentation for jnp.load 2024-10-19 06:20:20 -07:00
Ayaka
884f1dc3a1 [Pallas TPU] Use new MLIR op names
PiperOrigin-RevId: 687454709
2024-10-18 16:14:27 -07:00
Adam Paszke
bbcc3eef3c [Pallas:MGPU] Fix the implementation of WGMMA with transposed RHS
It's not enough that we have the physical transpose between the order
of tiled dimensions, we also need the user to explicitly transpose the
logical dimensions. This fixes a shape error that was previously hidden
because the RHS was square.

PiperOrigin-RevId: 687350270
2024-10-18 10:31:42 -07:00
Yash Katariya
2153de4ce0 [sharding_in_types] If out_aval.sharding is not None and the user specified out_sharding is None, concretize it with the device assignment available and add it to the final out_shardings that's used for lowering and compilation.
This will allow us to return the exact sharding spec that sharding propagation rules figured out.

PiperOrigin-RevId: 687349015
2024-10-18 10:27:58 -07:00
Christos Perivolaropoulos
f8a3c0366b [pallas] run_scoped now supports partial discharge.
PiperOrigin-RevId: 687347284
2024-10-18 10:22:31 -07:00
Adam Paszke
e138e8e49d [Pallas:MGPU] Fix docstring for commit_shared
PiperOrigin-RevId: 687308732
2024-10-18 08:16:55 -07:00
Adam Paszke
4094564815 [Pallas:MGPU] Force alignment of SMEM allocations to 1024 bytes
This is to avoid issues when small buffers throw off the alignment for large TMA and WGMMA
operands. We should make this more refined in the future, but this should be enough for now.

PiperOrigin-RevId: 687264994
2024-10-18 05:21:53 -07:00
Adam Paszke
0ee9531ef2 [Pallas:MGPU] Add support for indexed refs to WGMMA
PiperOrigin-RevId: 687258992
2024-10-18 04:55:34 -07:00
Adam Paszke
f2edc83af3 [Pallas:MGPU] Properly commute indexing with other transforms
Doing so requires us to modify the other transforms when we attempt to
move indexing before them.

PiperOrigin-RevId: 687240515
2024-10-18 03:39:51 -07:00
Yash Katariya
4db212d2c6 Add _sharding argument to broadcasted_iota as a private parameter which only works under sharding_in_types mode.
This is required because `jax.nn.one_hot` calls into `broascasted_iota`.

PiperOrigin-RevId: 687152343
2024-10-17 21:16:51 -07:00
Dan Foreman-Mackey
8361eb58e1 Activate the FFI implementation of SVD on GPU.
Alongside activating this new implementation, this change adds a new `algorithm` parameter to `jax.lax.svd`. Previously the choice of algorithm was made based on heuristics in the lowering rule, but it probably also makes sense to expose an option for users to specify the algorithm explicitly because our heuristics are not very carefully optimized.

This change updates the implementation of SVD in `lax` to use the FFI version which was added to jaxlib in https://github.com/jax-ml/jax/pull/23794. This comes with a few benefits:

1. When running on a CUDA platform, the 64-bit API will be used for the algorithm based on QR decomposition. (Note that it looks like the 64-bit API isn't available on ROCm.) This addresses part of the feature request in https://github.com/jax-ml/jax/issues/23413, although there's still work to do to port the rest of the GPU calls to the 64-bit API.

2. This implementation supports shape polymorphism in all dimensions with some caveats. By default, we do use some heuristics to based on the matrix sizes to select the algorithm that is used, and the three different algorithms (QR, Jacobi, and batched Jacobi) have sufficiently different behavior (QR returns V^H, whereas Jacobi returns V; batched Jacobi doesn't support `full_matrices=False`) that I couldn't work out a simple way to push this logic into the kernel. If the symbolic constraints are not sufficient to concretely determine the heuristics, we always use the QR algorithm. But, I've also exposed the algorithm selection in the user API, so it's possible to bypass the heuristics and get consistent behavior alongside shape polymorphism if needed.

Besides these core changes, I removed the forward compatibility checks from the CPU lowering, since we're well outside of the forward compatibility window now.

PiperOrigin-RevId: 687106965
2024-10-17 17:57:06 -07:00
Yash Katariya
3e634d9530 [sharding_in_types] Add lax.transpose sharding propagation rule
PiperOrigin-RevId: 687094297
2024-10-17 17:08:04 -07:00
Yash Katariya
57a95a77ff [sharding_in_types] Support jnp.array with sharding_in_types. When the input array has a sharding, propagate it through without dropping the sharding.
PiperOrigin-RevId: 687089357
2024-10-17 16:51:41 -07:00
Yash Katariya
5df4878ad0 [sharding_in_types] Add reduce max, integer_pow and standard_unop sharding rules
PiperOrigin-RevId: 687073144
2024-10-17 15:55:29 -07:00
Yash Katariya
e92e1191b3 [sharding_in_types] Add broadcast_in_dim rule.
PiperOrigin-RevId: 687054181
2024-10-17 14:55:10 -07:00
Adam Paszke
2d78b17226 [Pallas:MGPU] Add support for transforms in user-specified async copies
PiperOrigin-RevId: 687019020
2024-10-17 13:10:45 -07:00
Ionel Gog
ec279f9c54 Add config option to log or fatal when jax.Arrays are GCed.
Introduces `jax.config.array_garbage_collection_guard`, which is a tristate config for setting up a `jax.Array` garbage collection guard. The possible configs are:
* allow: `jax.Array`s are allowed to be garbage collected. This is the default value.
* log: whenever a `jax.Array` is GCed a log entry is generated with the array's traceback.
* fatal: fatal crash when a `jax.Array` is GCed. This is meant to be used for mature code bases that do tight memory management, and are reference cycle free.

PiperOrigin-RevId: 687003464
2024-10-17 12:23:16 -07:00