rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-17 04:16:07 +00:00

Author	SHA1	Message	Date
Dan Foreman-Mackey	ad1bd38790	Move logic about when to dispatch to batched LU decomposition algorithm on GPU into the kernel. This simplifies the lowering logic, and means that we don't get hit with a performance penalty when exporting with shape polymorphism. PiperOrigin-RevId: 662945116	2024-08-14 09:20:40 -07:00
Ruturaj4	a2d79936df	[ROCM] Fix BUILD.bazel library source paths	2024-08-07 09:18:20 -05:00
Dan Foreman-Mackey	8df0c3a9cc	Port Getrf GPU kernel from custom call to FFI. PiperOrigin-RevId: 658550170	2024-08-01 15:02:25 -07:00
Dan Foreman-Mackey	f20efc630f	Move jaxlib GPU handlers to separate build target. In anticipation of refactoring the jaxlib GPU custom calls into FFI calls, this change moves the implementation of `BlasHandlePool`, `SolverHandlePool`, and `SpSolverHandlePool` into new target. PiperOrigin-RevId: 658497960	2024-08-01 12:30:04 -07:00
Dan Foreman-Mackey	4f394828e1	Fix C++ registration of FFI handlers and consolidate gpu/linalg kernel implementation. This change does a few things (arguably too many): 1. The key change here is that it fixes the handler registration in `jaxlib/gpu/gpu_kernels.cc` for the two handlers that use the XLA FFI API. A previous attempt at this change caused downstream issues because of duplicate registrations, but we were able to fix that directly in XLA. 2. A second related change is to declare and define the XLA FFI handlers consistently using the `XLA_FFI_DECLARE_HANDLER_SYMBOL` and `XLA_FFI_DEFINE_HANDLER_SYMBOL` macros. We need to use these macros instead of the `XLA_FFI_DEFINE_HANDLER` version which produces a lambda, so that when XLA checks the address of the handler during registration it is consistent. Without this change, the downstream tests would continue to fail. 3. The final change is to consolidate the `cholesky_update_kernel` and `lu_pivot_kernels` implementations into a common `linalg_kernels` target. This makes the implementation of the `_linalg` nanobind module consistent with the other targets within `jaxlib/gpu`, and (I think!) makes the details easier to follow. This last change is less urgent, but it was what I set out to do so that's why I'm suggesting them all together, but I can split this in two if that would be preferred. PiperOrigin-RevId: 651107659	2024-07-10 12:09:12 -07:00
Ruturaj4	58b658cfb8	[ROCM] add typed XLA FFI support in rocm specific code	2024-07-02 11:04:43 -05:00
Dan Foreman-Mackey	9ae1c56c44	Update lu_pivots_to_permutation to use FFI dimensions on GPU. The XLA FFI interface provides metadata about buffer dimensions, so quantities like batch dimensions can be evaluated on the backend, instead of passed as attributes. This change has the added benefit of allowing this FFI call to support "vectorized" vmap and dynamic shapes. PiperOrigin-RevId: 647343656	2024-06-27 09:27:15 -07:00
Ruturaj4	a00d030248	[ROCM] nits and fixes	2024-06-18 20:21:23 +00:00
Ruturaj4	99c2b7b4e9	[ROCm] Bring-up pjrt support	2024-06-17 16:49:22 +00:00
Ruturaj4	79fccf6c82	add cholesky changes in bazel	2024-05-18 00:37:09 +00:00
Sergei Lebedev	51fc4f85ad	Ported LuPivotsToPermutation to the typed XLA FFI The typed FFI * allows passing custom call attributes directly to backend_config= instead of serializing them into a C++ struct. * It also handles validation and deserialization of custom call operands. PiperOrigin-RevId: 630067005	2024-05-02 08:12:05 -07:00
Ruturaj4	97bf2d2bb8	[ROCm]: fix tsl path	2024-04-08 19:58:41 -05:00
Rahul Batra	b4b97cd8e8	[ROCm]: Add jax-triton support for ROCm	2023-10-18 07:09:20 +00:00
Rahul Batra	4091ac646c	[ROCm]: Fix duplicate deps include	2023-09-08 22:56:59 +00:00
Peter Hawkins	70b7d50181	Switch jaxlib to use nanobind instead of pybind11. nanobind has a number of advantages (https://nanobind.readthedocs.io/en/latest/why.html), notably speed of compilation and dispatch, but the main reason to do this for these bindings is because nanobind can target the Python Stable ABI starting with Python 3.12. This means that we will not need to ship per-Python version CUDA plugins starting with Python 3.12. PiperOrigin-RevId: 559898790	2023-08-24 16:07:56 -07:00
Chris Jones	6b13d4eb86	Add branch prediction to JAX status macros. PiperOrigin-RevId: 535233546	2023-05-25 06:23:23 -07:00
Peter Hawkins	172a831219	Switch JAX to use the OpenXLA repository.	2023-03-13 18:38:26 +00:00
Peter Hawkins	a852710a09	Merge CUDA and ROCM kernel code in jaxlib. The code for both CUDA and ROCM is almost identical, so with a small shim library to handle the differences we can share almost everything. PiperOrigin-RevId: 483666051	2022-10-25 07:23:34 -07:00
Peter Hawkins	5617a02fa4	Remove JAX custom call implementation of batched triangular solve. XLA supports batched triangular solve on GPU and has since February 2022, which is older than the minimum jaxlib version. We can therefore delete our implementation and just use XLA's implementation. PiperOrigin-RevId: 482031830	2022-10-18 15:04:14 -07:00
Peter Hawkins	2693afa263	Revert: Use input-output aliasing for jaxlib GPU custom calls. Previously we had no way to tell XLA that inputs and outputs of GPU custom calls must alias. This now works in XLA:GPU so we can just ask XLA to enforce the aliasing we need. This seems to be causing some test failures downstream, so reverting this for the moment until I can debug them. PiperOrigin-RevId: 479670565	2022-10-07 14:36:24 -07:00
Peter Hawkins	93b839ace4	Use input-output aliasing for jaxlib GPU custom calls. Previously we had no way to tell XLA that inputs and outputs of GPU custom calls must alias. This now works in XLA:GPU so we can just ask XLA to enforce the aliasing we need. PiperOrigin-RevId: 479642543	2022-10-07 12:22:04 -07:00
Rohit Santhanam	b815ac9d8e	[ROCm] Upgrade to ROCm 5.3 and associated enhancements	2022-10-01 04:45:26 -07:00
Peter Hawkins	ba557d5e1b	Change JAX's copyright attribution from "Google LLC" to "The JAX Authors.". See https://opensource.google/documentation/reference/releasing/contributions#copyright for more details. PiperOrigin-RevId: 476167538	2022-09-22 12:27:19 -07:00
Rohit Santhanam	82adc6a1d0	[ROCm] Enhance hipsparse to reach parity with cusparse based on commit `d37b711dd4`.	2022-08-21 22:46:00 +00:00
Rohit Santhanam	080cf47002	[ROCm] Fixes for compilation failures caused by compiler changes in ROCm Tensorflow fork.	2022-06-29 14:34:08 +00:00
Peter Hawkins	aa7d291767	Replace references to absl::string_view with std::string_view. PiperOrigin-RevId: 450768333	2022-05-24 14:21:32 -07:00
Rohit Santhanam	f9321f2536	Fix hipblas kernels for ROCm.	2022-05-19 14:47:21 +00:00
Peter Hawkins	bb0816227d	Add a batched QR decomposition implementation on GPU. PiperOrigin-RevId: 449583027	2022-05-18 14:50:18 -07:00
Rohit Santhanam	bbdcec84f8	Fixes to enable JAX to build on ROCm.	2022-05-06 22:57:51 +00:00
Peter Hawkins	08c3c2ec24	Split CUDA and HIP C++ code in jaxlib into separate directories. PiperOrigin-RevId: 447062506	2022-05-06 13:48:00 -07:00

30 Commits