rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-16 03:46:06 +00:00

Author	SHA1	Message	Date
Dan Foreman-Mackey	b6306e3953	Remove synchronization from GPU LU decomposition kernel by adding an async batch pointers builder. In the batched LU decomposition in cuBLAS, the output buffer is required to be a pointer of pointers to the appropriate batch matrices. Previously this reshaping was done on the host and then copied to the device, requiring a synchronization, but it seems straightforward to instead implement a tiny CUDA kernel to do this work. This definitely isn't a bottleneck or a high priority change, but this seemed like a reasonable time to fix a longstanding TODO. PiperOrigin-RevId: 663686539	2024-08-16 04:37:09 -07:00
Peter Hawkins	07fa9dc3db	Fix cupti-related build failure under CUDA 11. cuptiGetErrorMessage was added in CUDA 12.2. PiperOrigin-RevId: 568962562	2023-09-27 14:33:30 -07:00
Peter Hawkins	9404518201	[CUDA] Add code to jax initialization that verifies that the CUDA libraries that are found are at least as new as the versions against which JAX was built. This is intended to flag cases where the wrong CUDA libraries are used, either because: * the user self-installed CUDA and that installation is too old, or * the user used the pip package installation, but due to LD_LIBRARY_PATH overrides or similar we didn't end up using the pip-installed version. PiperOrigin-RevId: 568910422	2023-09-27 11:28:40 -07:00
Chris Jones	31b862dd56	[jax_triton] Split C++ only parts of Triton custom callback from Python parts. Register callback with default call target name from C++, enabling Triton calls with the default name to work in C++ only contexts (e.g. serving). PiperOrigin-RevId: 545211452	2023-07-03 06:52:32 -07:00
Chris Jones	6b13d4eb86	Add branch prediction to JAX status macros. PiperOrigin-RevId: 535233546	2023-05-25 06:23:23 -07:00
jax authors	d1fbdbc1cf	Rollback of "Add CUDNN custom call for LSTM. Exposed as jax.experimental.rnn module." PiperOrigin-RevId: 490499003	2022-11-23 07:48:05 -08:00
Qiao Zhang	78963b6020	Add CUDNN custom call for LSTM. Exposed as jax.experimental.rnn module. PiperOrigin-RevId: 490387796	2022-11-22 18:53:29 -08:00
Peter Hawkins	a852710a09	Merge CUDA and ROCM kernel code in jaxlib. The code for both CUDA and ROCM is almost identical, so with a small shim library to handle the differences we can share almost everything. PiperOrigin-RevId: 483666051	2022-10-25 07:23:34 -07:00

8 Commits