rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-22 23:46:04 +00:00

Author	SHA1	Message	Date
jax authors	48bddc6f6c	Adds arith.select to the op patters in order to canonicalize non 32 bit selects. PiperOrigin-RevId: 687635492	2024-10-19 09:09:06 -07:00
Benjamin Chetioui	ade480ff05	Add a dialect for Mosaic GPU. PiperOrigin-RevId: 687325692	2024-10-18 09:11:31 -07:00
Dan Foreman-Mackey	8361eb58e1	Activate the FFI implementation of SVD on GPU. Alongside activating this new implementation, this change adds a new `algorithm` parameter to `jax.lax.svd`. Previously the choice of algorithm was made based on heuristics in the lowering rule, but it probably also makes sense to expose an option for users to specify the algorithm explicitly because our heuristics are not very carefully optimized. This change updates the implementation of SVD in `lax` to use the FFI version which was added to jaxlib in https://github.com/jax-ml/jax/pull/23794. This comes with a few benefits: 1. When running on a CUDA platform, the 64-bit API will be used for the algorithm based on QR decomposition. (Note that it looks like the 64-bit API isn't available on ROCm.) This addresses part of the feature request in https://github.com/jax-ml/jax/issues/23413, although there's still work to do to port the rest of the GPU calls to the 64-bit API. 2. This implementation supports shape polymorphism in all dimensions with some caveats. By default, we do use some heuristics to based on the matrix sizes to select the algorithm that is used, and the three different algorithms (QR, Jacobi, and batched Jacobi) have sufficiently different behavior (QR returns V^H, whereas Jacobi returns V; batched Jacobi doesn't support `full_matrices=False`) that I couldn't work out a simple way to push this logic into the kernel. If the symbolic constraints are not sufficient to concretely determine the heuristics, we always use the QR algorithm. But, I've also exposed the algorithm selection in the user API, so it's possible to bypass the heuristics and get consistent behavior alongside shape polymorphism if needed. Besides these core changes, I removed the forward compatibility checks from the CPU lowering, since we're well outside of the forward compatibility window now. PiperOrigin-RevId: 687106965	2024-10-17 17:57:06 -07:00
jax authors	6c2649fdf2	Rewrite mosaic concat to support operand shapes that do not align with native shapes, Expand tests to cover multi operand, batch dim concat, etc. PiperOrigin-RevId: 687003778	2024-10-17 12:24:51 -07:00
Ionel Gog	ec279f9c54	Add config option to log or fatal when jax.Arrays are GCed. Introduces `jax.config.array_garbage_collection_guard`, which is a tristate config for setting up a `jax.Array` garbage collection guard. The possible configs are: * allow: `jax.Array`s are allowed to be garbage collected. This is the default value. * log: whenever a `jax.Array` is GCed a log entry is generated with the array's traceback. * fatal: fatal crash when a `jax.Array` is GCed. This is meant to be used for mature code bases that do tight memory management, and are reference cycle free. PiperOrigin-RevId: 687003464	2024-10-17 12:23:16 -07:00
jax authors	9027fb38fe	Fix segfault PiperOrigin-RevId: 686821923	2024-10-17 01:52:44 -07:00
Jevin Jiang	a47b755619	[Mosaic TPU] Support native int4 @ int4 PiperOrigin-RevId: 686179715	2024-10-15 11:35:23 -07:00
Yash Katariya	824ccd7183	[Shardy] Inline meshes when using shardy and get rid of global meshes from the MLIR body. Also do a couple of cleanups. PiperOrigin-RevId: 685746298	2024-10-14 10:08:04 -07:00
Bart Chrzaszcz	75e22f2ccd	#sdy Run inlined mesh lifter pass at the end of JAX lowering. PiperOrigin-RevId: 685728692	2024-10-14 09:13:12 -07:00
jax authors	57ef7a4a59	Merge pull request #24274 from ROCm:ci_linalg_fix PiperOrigin-RevId: 685717437	2024-10-14 08:33:33 -07:00
Paweł Paruzel	23fdb91252	Port Schur Decomposition to XLA's FFI This CL only contains the C++ changes. Python lowering code will be added after the forward compatibility window of 3 weeks. PiperOrigin-RevId: 685689593	2024-10-14 06:46:42 -07:00
Paweł Paruzel	ec68d420fe	Port Tridiagonal Reduction to XLA's FFI This CL only contains the C++ changes. Python lowering code will be added after the forward compatibility window of 3 weeks. PiperOrigin-RevId: 685679646	2024-10-14 06:02:59 -07:00
Ruturaj4	ee223d4004	[ROCm] jaxlib linalg fix	2024-10-13 20:25:18 -05:00
jax authors	e4629f6a4c	Merge pull request #24232 from ROCm:ci_rv_clang_clean PiperOrigin-RevId: 684891301	2024-10-11 11:00:55 -07:00
Ruturaj4	89cd375c85	[JAX] bazel build rocm changes	2024-10-10 18:00:15 -05:00
Ruturaj4	33bcd0cb7a	[ROCm] Bring up clang support for JAX+XLA * Add clang path * bazelrc env fixes * Fix wheelhouse installation and preserve wheels * dockerfile changes * Add target.lst * Change target architectures * Install bzip2 and sqlite packages	2024-10-10 16:31:26 -05:00
Dan Foreman-Mackey	6625a2b3ed	Update Eigh kernel on GPU to use 64-bit interface when it is available. Part of https://github.com/jax-ml/jax/issues/23413 PiperOrigin-RevId: 684546802	2024-10-10 12:59:37 -07:00
Peter Hawkins	cf5f15773a	Remove dead ducc_fft code. I guess this was omitted when we switched over to using stablehlo.fft since XLA now calls DUCC itself. PiperOrigin-RevId: 684437739	2024-10-10 07:33:54 -07:00
jax authors	81a95f78b9	[Mosaic] Parameterize the number of lanes and sublanes in TPU dialects. PiperOrigin-RevId: 684392184	2024-10-10 04:28:36 -07:00
Jevin Jiang	f52b016de1	[Mosaic TPU] Change getLayout to force offset to 0 when inferring input has offset out of the first tile. PiperOrigin-RevId: 684145987	2024-10-09 13:11:49 -07:00
Jevin Jiang	f96c5661ac	[Mosaic TPU][NFC] Refactor tpu matmul rule. * Separate MXU size to MXU contracting size and MXU non-contracting size. * Rename tile to group for MXU shaped tiling since tile is overused in Mosaic. PiperOrigin-RevId: 684116306	2024-10-09 11:45:25 -07:00
jax authors	9748e2ab1a	[JAX] Fix error message for matmul operand shape check. PiperOrigin-RevId: 683778484	2024-10-08 15:07:20 -07:00
Eric Salo	713e909ba0	cleanup: remove api_version from BUILD files PiperOrigin-RevId: 683658237	2024-10-08 09:44:15 -07:00
Peter Hawkins	145304a0e0	Remove reference to outfeed_receiver.pyi, which was deleted. PiperOrigin-RevId: 683195999	2024-10-07 08:37:14 -07:00
Dan Foreman-Mackey	67f24df740	Activate FFI implementation of symmetric Eigendecomposition. These kernels support shape polymorphism in all dimensions and no GPU is required during lowering. The kernels have been included in jaxlib for more than 3 weeks so we don't need to include any forward compatibility checks. PiperOrigin-RevId: 682415506	2024-10-04 12:38:26 -07:00
Dan Foreman-Mackey	c0240764bc	Activate FFI implementation of the QR decomposition. As part of this change, I've added support and tests for shape polymorphism and export on CPU and GPU. The FFI kernels have been available in jaxlib for over 3 weeks already and they are included with the latest release of jaxlib on PyPI so we don't need to worry about the forward compatibility checks. With this in mind, I also removed the old lowering rules, but kept the backwards compatibility tests for now. PiperOrigin-RevId: 682312752	2024-10-04 07:27:11 -07:00
Paweł Paruzel	6e9a53690c	Activate Hessenberg Decomposition to XLA's FFI Additionally, created a missing backward compatibility test for the old LAPACK kernels of Hessenberg Decomposition. PiperOrigin-RevId: 681047625	2024-10-01 09:20:06 -07:00
Adam Paszke	f62941d126	[Mosaic TPU] The previous change does not actually force the input offsets read by the rules, but simply disables all the checks. Reverting so that we at least regain the checks until we have a proper fix. Reverts 4a596aee1e8920f5b51d5bd573df976390bbd437 PiperOrigin-RevId: 680925509	2024-10-01 02:23:52 -07:00
Jevin Jiang	4a596aee1e	[Mosaic TPU] Force offset to 0 when inferring input has offset out of the first tile. We still have this temporary check in apply vector layout, but in infer vector layout, instead of throwing error, we should just reset offset to zero. Because some ops which has relaxed this restriction might be passed as input for un-relaxed ops and cause failure. PiperOrigin-RevId: 680706301	2024-09-30 13:52:48 -07:00
Jevin Jiang	7e2f487ada	[Mosaic TPU] Canonicalize arith.select's condition to vector if other types are vector. This fixes the failure in elementwise rule of apply vector layout pass. If the condition scalar is static, it will be simplified to corresponding vector from true value and false value by MLIR. If the condition scalar is dynamic, we want to use vselect over scf.if anyway. Because latter creates a inner region. PiperOrigin-RevId: 680674560	2024-09-30 12:26:44 -07:00
Dan Foreman-Mackey	1a1e16abcc	Remove forward compatibility checks from lowering of LU decomposition. The forward compatibility window for these checks has passed so it is now safe to remove them. PiperOrigin-RevId: 680565099	2024-09-30 07:23:56 -07:00
Peter Hawkins	5a1d0a6c26	Include the sdy MLIR dialect in jaxlib. We're seeing test failures from tests assuming that this dialect exists. But given we plan to enable it at some point, we may as well just include it in the build. The size impact is small (around 400K uncompressed). PiperOrigin-RevId: 679608092	2024-09-27 08:53:31 -07:00
Peter Hawkins	26632fd344	Replace disable_backends with enable_backends on jax_multiplatform_test. Most users of disable_backends were actually using it to enable only a single backend. So things are simpler if we negate the sense of the option to say that. Change disable_configs to enable_configs, with a default `None` value meaning "everything is enabled". We change the relationship between enable_backends, disable_configs, enable_configs to be the following: * `enable_backends` selects a set of initial test configurations to enable, based off backend only. * `disable_configs` then prunes that set of test configurations, removing elements from the set. * `enable_configs` then adds additional configurations to the set. Fix code in jax/experimental/mosaic/gpu/examples not to depend on a Google-internal GPU support target. PiperOrigin-RevId: 679563155	2024-09-27 06:15:31 -07:00
Justin Fu	9f4e8d0039	[XLA:Mosaic][Pallas] Enable vector.ExtractOp for non-zero indices. PiperOrigin-RevId: 679283281	2024-09-26 13:57:45 -07:00
Jevin Jiang	e4ca4f5a57	Roll back cl/678765762 [Mosaic TPU] Support bitcast without forcing retiling. Reverts 37641dd4fade625563321b7e1e87165df23cf4a8 PiperOrigin-RevId: 678881199	2024-09-25 16:02:58 -07:00
Jevin Jiang	37641dd4fa	[Mosaic TPU] Support bitcast without forcing retiling. PiperOrigin-RevId: 678765762	2024-09-25 10:57:09 -07:00
Peter Hawkins	70f91db853	Set PYTHONWARNINGS=error in bazel tests. The goal of this change is to catch PRs that introduce new warnings sooner. To help pass the environment variable more easily, rename the jax_test Bazel test macro to jax_multiplatform_test, and introduce a new jax_py_test macro that wraps py_test. Add code to both to set the environment variable. Add code to suppress some new warnings uncovered in CI. PiperOrigin-RevId: 678352286	2024-09-24 12:30:11 -07:00
Jevin Jiang	407dc774f7	[Mosaic TPU] Support all cases for extui. PiperOrigin-RevId: 678331795	2024-09-24 11:35:03 -07:00
jax authors	2c85465ebe	Merge pull request #23806 from gspschmid:gschmid/ffi-ext-bundle PiperOrigin-RevId: 678273475	2024-09-24 09:05:20 -07:00
Ruturaj4	29a1cb766e	[ROCM] add missing typename keyword to work with gcc	2024-09-23 14:42:01 -05:00
Jevin Jiang	6b93b35842	[Mosaic:TPU] Efficient relayout with internal scratch We should support all different retilings (xpacking1, 128) <-> (ypacking2, 128) with any dtype in this cl at this moment. The efficient relayout with scratch brings significant improvements on current retiling in <= TPUv4 and retiling with (packing, 128) in TPUv5. All missing retiling supports are added in this cl, including increase sublane retiling and packed type retiling. PiperOrigin-RevId: 676982957	2024-09-20 15:00:58 -07:00
Adam Paszke	99195ead83	[Mosaic TPU] Try reducing sublane tiling to support more vector.shape_casts In particular, 32-bit values should now support all reshapes that do not modify the last dimension. PiperOrigin-RevId: 676855401	2024-09-20 08:36:22 -07:00
Dan Foreman-Mackey	bc80ecbbe4	Remove forward compatibility checks from cholesky_update lowering. The forward compatibility window has ended and it should be safe to remove these checks. PiperOrigin-RevId: 676853740	2024-09-20 08:32:25 -07:00
Michael Hudgins	d4d1518c3d	Update references to the GitHub url in JAX codebase to reflect move from google/jax to jax-ml/jax PiperOrigin-RevId: 676843138	2024-09-20 07:52:33 -07:00
Dan Foreman-Mackey	afaa3bf43c	Port GPU kernels for SVD to the FFI. Unlike the other GPU linear algebra kernels that I've ported so far, this one isn't straightforward to implement as a single kernel, and while it does support lowering without access to a GPU (no more descriptor!), it only supports dynamics shapes in the batch dimensions. There are two main technical challenges: 1. The main `gesvd` kernels in cuSolver/hipSolver only support matrices with shape `(m, n)` with `m >= n`. This means that we need to transpose the inputs and outputs as part of the lowering rule when `m < n`. (Note: we actually just use C layouts instead of Fortran layouts to implement this case.) While this could be handled in the kernel, this seemed like a lot of work for somewhat limited benefit, and it would probably have performance implications. 2. The `gesvd` and `gesvdj` kernels return `V^H` and `V` respectively, and the batched version of `gesvdj` doesn't support `full_matrices=False`. This means that we need logic in the lowering rule to handle transposition and slicing. This makes it hard to have the algorithm selection be a parameter to the kernel. Another note: cuSolver has a 64-bit implementation of the SVD, and we always use that implementation on the CUDA backend. The 32-bit interface is included for ROCM support, and I have tested it manually. This was a feature request from https://github.com/jax-ml/jax/issues/23413. PiperOrigin-RevId: 676839182	2024-09-20 07:34:50 -07:00
Jevin Jiang	47b177bd03	[Mosaic TPU][NFC] Remove FailureOr in getNativeVregOrVmaskTypeImpl PiperOrigin-RevId: 676566796	2024-09-19 14:35:41 -07:00
Georg Stefan Schmid	d0338f5d13	[ffi] Support handler bundles in GPU plugin extension	2024-09-19 14:51:02 +00:00
Peter Hawkins	922e652c05	Replace plat-name with plat_name. The former seems to elicit a deprecation warning from setuptools recently.	2024-09-18 15:17:49 +00:00
jax authors	4e6f690724	Merge pull request #23653 from apaszke:torchsaic PiperOrigin-RevId: 675967844	2024-09-18 06:35:15 -07:00
Adam Paszke	611ad63060	Add basic PyTorch integration for Mosaic GPU We have already had most of the relevant pieces and we only needed to connect them together. The most sensitive change is perhaps that I needed to expose one more symbol from the XLA GPU plugin, but I don't think it should be a problem.	2024-09-18 12:55:23 +00:00

1 2 3 4 5 ...

1120 Commits