rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-18 12:56:07 +00:00

Author	SHA1	Message	Date
Justin Fu	0b46a236c1	Update Pallas distributed tutorials with jax.make_mesh	2024-10-21 12:49:56 -07:00
jax authors	16fca386a3	Update XLA dependency to use revision `76da730179`. PiperOrigin-RevId: 688222632	2024-10-21 12:03:12 -07:00
jax authors	4a5ca2fd00	Merge pull request #24400 from jakevdp:subtract-ufunc PiperOrigin-RevId: 688190106	2024-10-21 10:38:52 -07:00
jax authors	65307abd81	Merge pull request #24370 from dfm:ffi-call-to-callable PiperOrigin-RevId: 688188390	2024-10-21 10:34:56 -07:00
Jake VanderPlas	6467d03925	Make jnp.subtract a ufunc	2024-10-21 10:11:51 -07:00
Ezekiel Calubaquib	ad53addb74	Move out mnist py/Jax tensorflow lite tests to tensorflow lite repo PiperOrigin-RevId: 688178268	2024-10-21 10:08:21 -07:00
jax authors	e29b93ff3e	Merge pull request #24421 from jakevdp:cross-doc PiperOrigin-RevId: 688175417	2024-10-21 10:01:45 -07:00
Dan Foreman-Mackey	0b651f0f45	Make ffi_call return a callable	2024-10-21 12:16:57 -04:00
jax authors	fe83d888b9	Merge pull request #24417 from rajasekharporeddy:testbranch1 PiperOrigin-RevId: 688159150	2024-10-21 09:10:45 -07:00
rajasekharporeddy	02f65bb11a	Update warning message for jit of pmap	2024-10-21 21:17:59 +05:30
Yash Katariya	783285a71c	FIx jax2tf breakge of iota PiperOrigin-RevId: 688146581	2024-10-21 08:30:49 -07:00
Adam Paszke	f833891c87	[Pallas:MGPU] Add support for passing in WGMMA lhs from registers PiperOrigin-RevId: 688117316	2024-10-21 06:42:18 -07:00
Adam Paszke	f08801b8d6	[Pallas:MGPU] Allow indexing to appear anywhere in the list of transforms We only need to exchange the transforms preceding the indexer, while the rest can remain unmodified. PiperOrigin-RevId: 688112088	2024-10-21 06:22:16 -07:00
Jake VanderPlas	a1140e9246	Better docs for jnp.cross	2024-10-21 05:59:22 -07:00
Nitin Srinivasan	a2bc8c2e07	Remove temporary aliases from .bazelrc These aliases were added to not break existing presubmit builds. Now that the presubmit builds have been updated, these aliases can be removed. Also, corrects some comments. PiperOrigin-RevId: 688096364	2024-10-21 05:20:13 -07:00
jax authors	f4b84e1c97	Merge pull request #24342 from gnecula:export_custom_types PiperOrigin-RevId: 688093192	2024-10-21 05:08:04 -07:00
George Necula	2feea414ac	[export] Add support for serialization for some custom PyTree nodes See the added documentation for `jax._src.export.register_pytree_node_serialization` and `jax._src.export.register_namedtuple_serialization`. Serialization of PyTree nodes is needed to serialize the `in_tree` and `out_tree` fields of `Exported` functions (not to serialize actual instances of the custom types). When writing this I have looked at how TensorFlow handles namedtuple. It does so transparently, without requiring the user to register a serialization handler for the namedtuple type. But this has the disadvantage that on deserializaton a fresh distinct namedtuple type is created for each input and output type of the serialized function. This means that calling the deserialized function will return outputs of different types than then function that was serialized. This can be confusing. The Python pickle mode does a bit better: it attempts to look up the namedtuple type as a module attribute in the deserializing code, importing automatically the module whose name was saved during serialization. This is too much magic for my taste, as it can result in strange import errors. Hence I added an explicit step for the user to say how they want the namedtuple to be serialized and deserialized. Since I wanted to also add support for `collections.OrderedDict`, which users are asking for, I added more general support for PyTree custom nodes. Note that this registration mechanism works in conjunction with the PyTree custom node registration mechanism. The burden is on the user to decide how to serialize and deserialize the custom auxdata that the PyTree custom registration mechanism uses. Not all custom types will be serializable, but many commonly used ones, e.g., dataclasses, can now be inputs and outputs of the serialized functions.	2024-10-21 11:38:13 +02:00
jax authors	0d7ef9c9ca	Merge pull request #24403 from jakevdp:load-doc PiperOrigin-RevId: 688048891	2024-10-21 02:19:32 -07:00
jax authors	33a73852eb	Update XLA dependency to use revision `8a7920d699`. PiperOrigin-RevId: 687898456	2024-10-20 13:03:47 -07:00
Yash Katariya	ca2d1584f8	Remove `mesh_utils.create_device_mesh` from docs PiperOrigin-RevId: 687695419	2024-10-19 15:48:42 -07:00
jax authors	77fb1eee11	Update XLA dependency to use revision `d0d716fb63`. PiperOrigin-RevId: 687675747	2024-10-19 13:38:07 -07:00
jax authors	48bddc6f6c	Adds arith.select to the op patters in order to canonicalize non 32 bit selects. PiperOrigin-RevId: 687635492	2024-10-19 09:09:06 -07:00
Jake VanderPlas	0a85ba5f82	Better documentation for jnp.load	2024-10-19 06:20:20 -07:00
Ayaka	884f1dc3a1	[Pallas TPU] Use new MLIR op names PiperOrigin-RevId: 687454709	2024-10-18 16:14:27 -07:00
jax authors	22426519b7	Update XLA dependency to use revision `7e3b0097bd`. PiperOrigin-RevId: 687427622	2024-10-18 14:35:02 -07:00
Adam Paszke	bbcc3eef3c	[Pallas:MGPU] Fix the implementation of WGMMA with transposed RHS It's not enough that we have the physical transpose between the order of tiled dimensions, we also need the user to explicitly transpose the logical dimensions. This fixes a shape error that was previously hidden because the RHS was square. PiperOrigin-RevId: 687350270	2024-10-18 10:31:42 -07:00
Yash Katariya	2153de4ce0	[sharding_in_types] If out_aval.sharding is not None and the user specified out_sharding is None, concretize it with the device assignment available and add it to the final out_shardings that's used for lowering and compilation. This will allow us to return the exact sharding spec that sharding propagation rules figured out. PiperOrigin-RevId: 687349015	2024-10-18 10:27:58 -07:00
Christos Perivolaropoulos	f8a3c0366b	[pallas] run_scoped now supports partial discharge. PiperOrigin-RevId: 687347284	2024-10-18 10:22:31 -07:00
Benjamin Chetioui	ade480ff05	Add a dialect for Mosaic GPU. PiperOrigin-RevId: 687325692	2024-10-18 09:11:31 -07:00
jax authors	eba5748094	Disable breaking test-case PiperOrigin-RevId: 687320199	2024-10-18 08:54:36 -07:00
Adam Paszke	e138e8e49d	[Pallas:MGPU] Fix docstring for commit_shared PiperOrigin-RevId: 687308732	2024-10-18 08:16:55 -07:00
Tom Hennigan	86155561fb	nit: Use `frozen` dataclasses rather than `unsafe_hash`. PiperOrigin-RevId: 687267707	2024-10-18 05:35:54 -07:00
Adam Paszke	4094564815	[Pallas:MGPU] Force alignment of SMEM allocations to 1024 bytes This is to avoid issues when small buffers throw off the alignment for large TMA and WGMMA operands. We should make this more refined in the future, but this should be enough for now. PiperOrigin-RevId: 687264994	2024-10-18 05:21:53 -07:00
Adam Paszke	0ee9531ef2	[Pallas:MGPU] Add support for indexed refs to WGMMA PiperOrigin-RevId: 687258992	2024-10-18 04:55:34 -07:00
Adam Paszke	f2edc83af3	[Pallas:MGPU] Properly commute indexing with other transforms Doing so requires us to modify the other transforms when we attempt to move indexing before them. PiperOrigin-RevId: 687240515	2024-10-18 03:39:51 -07:00
Yash Katariya	4db212d2c6	Add `_sharding` argument to broadcasted_iota as a private parameter which only works under sharding_in_types mode. This is required because `jax.nn.one_hot` calls into `broascasted_iota`. PiperOrigin-RevId: 687152343	2024-10-17 21:16:51 -07:00
jax authors	dd5426301a	Allow simple host call that uses host tensor as parameter/result in linear layout. This cl only handles very simple host call patterns. A more thorough implementation of propagation of T(1)S(5) will be done later. This cl doesn't handle host call that passes/returns tensors that live on device with linear layout either, which will also be impelmented separately. PiperOrigin-RevId: 687113203	2024-10-17 18:22:46 -07:00
Dan Foreman-Mackey	8361eb58e1	Activate the FFI implementation of SVD on GPU. Alongside activating this new implementation, this change adds a new `algorithm` parameter to `jax.lax.svd`. Previously the choice of algorithm was made based on heuristics in the lowering rule, but it probably also makes sense to expose an option for users to specify the algorithm explicitly because our heuristics are not very carefully optimized. This change updates the implementation of SVD in `lax` to use the FFI version which was added to jaxlib in https://github.com/jax-ml/jax/pull/23794. This comes with a few benefits: 1. When running on a CUDA platform, the 64-bit API will be used for the algorithm based on QR decomposition. (Note that it looks like the 64-bit API isn't available on ROCm.) This addresses part of the feature request in https://github.com/jax-ml/jax/issues/23413, although there's still work to do to port the rest of the GPU calls to the 64-bit API. 2. This implementation supports shape polymorphism in all dimensions with some caveats. By default, we do use some heuristics to based on the matrix sizes to select the algorithm that is used, and the three different algorithms (QR, Jacobi, and batched Jacobi) have sufficiently different behavior (QR returns V^H, whereas Jacobi returns V; batched Jacobi doesn't support `full_matrices=False`) that I couldn't work out a simple way to push this logic into the kernel. If the symbolic constraints are not sufficient to concretely determine the heuristics, we always use the QR algorithm. But, I've also exposed the algorithm selection in the user API, so it's possible to bypass the heuristics and get consistent behavior alongside shape polymorphism if needed. Besides these core changes, I removed the forward compatibility checks from the CPU lowering, since we're well outside of the forward compatibility window now. PiperOrigin-RevId: 687106965	2024-10-17 17:57:06 -07:00
Yash Katariya	3e634d9530	[sharding_in_types] Add lax.transpose sharding propagation rule PiperOrigin-RevId: 687094297	2024-10-17 17:08:04 -07:00
Yash Katariya	57a95a77ff	[sharding_in_types] Support jnp.array with sharding_in_types. When the input array has a sharding, propagate it through without dropping the sharding. PiperOrigin-RevId: 687089357	2024-10-17 16:51:41 -07:00
Yash Katariya	5df4878ad0	[sharding_in_types] Add reduce max, integer_pow and standard_unop sharding rules PiperOrigin-RevId: 687073144	2024-10-17 15:55:29 -07:00
Yash Katariya	e92e1191b3	[sharding_in_types] Add broadcast_in_dim rule. PiperOrigin-RevId: 687054181	2024-10-17 14:55:10 -07:00
jax authors	93389ab5f4	Update XLA dependency to use revision `70df652679`. PiperOrigin-RevId: 687045334	2024-10-17 14:29:44 -07:00
jax authors	919f7c8684	Merge pull request #24345 from phu0ngng:cuda_custom_call PiperOrigin-RevId: 687034466	2024-10-17 13:57:15 -07:00
Adam Paszke	2d78b17226	[Pallas:MGPU] Add support for transforms in user-specified async copies PiperOrigin-RevId: 687019020	2024-10-17 13:10:45 -07:00
jax authors	6c2649fdf2	Rewrite mosaic concat to support operand shapes that do not align with native shapes, Expand tests to cover multi operand, batch dim concat, etc. PiperOrigin-RevId: 687003778	2024-10-17 12:24:51 -07:00
Ionel Gog	ec279f9c54	Add config option to log or fatal when jax.Arrays are GCed. Introduces `jax.config.array_garbage_collection_guard`, which is a tristate config for setting up a `jax.Array` garbage collection guard. The possible configs are: * allow: `jax.Array`s are allowed to be garbage collected. This is the default value. * log: whenever a `jax.Array` is GCed a log entry is generated with the array's traceback. * fatal: fatal crash when a `jax.Array` is GCed. This is meant to be used for mature code bases that do tight memory management, and are reference cycle free. PiperOrigin-RevId: 687003464	2024-10-17 12:23:16 -07:00
jax authors	1b5cf5a494	Fix breaking test-case PiperOrigin-RevId: 686932281	2024-10-17 08:57:15 -07:00
Sergei Lebedev	de7beb91a7	[pallas:mosaic_gpu] Added `layout_cast` PiperOrigin-RevId: 686917796	2024-10-17 08:08:05 -07:00
Adam Paszke	0519db15ab	[Pallas:MGPU] Add lowerings for more ops PiperOrigin-RevId: 686910947	2024-10-17 07:42:56 -07:00

1 2 3 4 5 ...

23681 Commits