rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-15 19:36:06 +00:00

Author	SHA1	Message	Date
Peter Hawkins	3f91b4b43a	Move jaxlib/{cuda,rocm}_plugin_extension into jaxlib/{cuda/rocm}/ Move the common jaxlib/gpu_plugin_extension into jaxlib/gpu/ Cleanup only, no functional changes intended. PiperOrigin-RevId: 738183402	2025-03-18 16:29:37 -07:00
Peter Hawkins	547d602760	Remove //jaxlib:cpu_kernels and //jaxlib:gpu_kernels forwarding Bazel targets. These were temporary forwarding targets that are no longer needed; use //jaxlib/cpu:cpu_kernels and //jaxlib/cuda:cuda_gpu_kernels instead. PiperOrigin-RevId: 738085234	2025-03-18 11:39:00 -07:00
Peter Hawkins	14cb7453f0	Add a C++ implementation of a toplogical sort. This is an exact port of the current Python implementation to C++ for speed. I am being careful not to change the topological order we return in any way in this change, although we may do so in a future change. PiperOrigin-RevId: 737014989	2025-03-14 16:04:25 -07:00
Dan Foreman-Mackey	c7ed1bd3a8	Add version check to jaxlib plugin imports. For the CUDA and ROCM plugins, we only support exact matches between the plugin and jaxlib version, and bad things can happen if we try and load mismatched versions. This change issues a warning and skips importing a plugin when there is a version mismatch. There are a handful of other places where plugins are imported throughout the JAX codebase (e.g. in lax_numpy, mosaic_gpu, and in the plugins themselves). In a follow up it would be good to add version checking there too, but let's start with just these ones. PiperOrigin-RevId: 731808733	2025-02-27 11:52:17 -08:00
Sergei Lebedev	7929cd8410	[pallas:triton] The lowering now uses PTX instead of Triton IR This change improves the stability and backward compatibility of Pallas Triton calls, because unlike PTX, the Triton dialect has no stability guarantees and does change in practice. See #25196. A few notes * Pallas Triton no longer delegates compilation to PTX to XLA:GPU. Instead, compilation is done via a new PjRt extension, which uses its own compilation pipeline mirrored after the one in the Triton Python bindings. * The implementation of the old custom call used by Pallas Triton is deprecated and will be removed after 6 months as per [compatibility guarantees] [] []: https://jax.readthedocs.io/en/latest/export/export.html#compatibility-guarantees PiperOrigin-RevId: 722773884	2025-02-03 13:21:40 -08:00
Peter Hawkins	0389d617c8	Add a unittest test extension that runs test cases in parallel using threads. This change does not yet do the work necessary to make any tests pass with threading enabled, which will come in future changes. This approach is broadly inspired by `a6d205dd4c/testtools/testsuite.py (L113)` and by unittest-ft. We add a custom TestResult class that batches up any test result actions and applies them under a lock. We also add a custom TestSuite class that runs individual test cases in parallel using a thread-pool. We need a reader-writer lock to implement a `@jtu.thread_hostile_test` decorator, which we do by adding bindings around absl::Mutex to jaxlib. PiperOrigin-RevId: 713312937	2025-01-08 09:11:47 -08:00
Dan Foreman-Mackey	a7f384cc6e	Add a register_custom_type_id function to the GPU plugins. This enables dynamic registration of custom FFI types on the appropriate platform via PJRT. PiperOrigin-RevId: 712904085	2025-01-07 07:29:38 -08:00
Peter Hawkins	90d8f37863	Rename pybind_extension to nanobind_extension. We have no remaining uses of pybind11 outside a GPU custom call example. PiperOrigin-RevId: 712608834	2025-01-06 11:53:44 -08:00
Sergei Lebedev	a14e6968bf	[mosaic] Migrated the serialization pass from codegen to `pass_boilerplate.h` This prepares teh generalization of the serialization pass to handle both Mosaic TPU and GPU. PiperOrigin-RevId: 705628923	2024-12-12 14:19:36 -08:00
Charles Hofer	8d42fa0b0b	Remove cuda include from gpu plugin extension and BUILD	2024-12-11 11:55:51 -06:00
Dan Foreman-Mackey	593143e17e	Deduplicate some GPU plugin definition code. The `jaxlib/cuda_plugin_extension.cc` and `jaxlib/rocm_plugin_extension.cc` files were nearly identical so this change consolidates the shared implementation into a single target. PiperOrigin-RevId: 704785926	2024-12-10 11:32:06 -08:00
Kyle Lucke	f3e7e6829a	Remove unneeded dependency from rocm_plugin_extension. PiperOrigin-RevId: 698872849	2024-11-21 12:18:11 -08:00
jax authors	e4629f6a4c	Merge pull request #24232 from ROCm:ci_rv_clang_clean PiperOrigin-RevId: 684891301	2024-10-11 11:00:55 -07:00
Ruturaj4	33bcd0cb7a	[ROCm] Bring up clang support for JAX+XLA * Add clang path * bazelrc env fixes * Fix wheelhouse installation and preserve wheels * dockerfile changes * Add target.lst * Change target architectures * Install bzip2 and sqlite packages	2024-10-10 16:31:26 -05:00
Peter Hawkins	6d1f51e63d	Clean up BUILD files. PiperOrigin-RevId: 667604964	2024-08-26 09:11:17 -07:00
Dan Foreman-Mackey	618754d829	Move some common helper functions from lapack_kernels to ffi_helpers. There were two helper functions for implementing FFI calls that were included directly alongside jaxlib's CPU kernels that will be useful for the GPU kernels as well. This moves those functions into ffi_helpers so that they are accessible from there too. PiperOrigin-RevId: 658002501	2024-07-31 07:38:33 -07:00
Bart Chrzaszcz	864178d3a3	#sdy Initial set of changes to allow for lowering to the Shardy dialect. The OpenXLA project is working on an open source, MLIR, named-axis based propagation (and in the future SP<D partitioning) system that will be dialect agnostic (would work for any dialect - MHLO, StableHLO, YourDialect). We plan on having frontends like JAX and PyTorch target this when using XLA and wanting SPMD propagation/partitioning. See www.github.com/openxla/shardy for more info. Currently Shardy is implemented inside the XLA compiler, requiring us to round-trip between StableHLO and HLO with `mhlo.sharding`s. But we will eventually make Shardy the first pass in the XLA pipeline while it's still working on StableHLO. Partitioning (the system that adds the collectives like all-gathers/all-reduces) will still be the GSPMD Partitioner, but next year the Shardy partitioner will be developed, allowing for propagation and partitioning to be completely in MLIR and the first pass in the pipeline. So then we'd have: 1. Traced jaxpr 2. Jaxpr -> StableHLO 3. StableHLO with Shardy propagation 4. StableHLO with Shardy partitioning 5. StableHLO -> HLO 6. XLA optimizations The following test: ```py def test_sdy_lowering(self): mesh = jtu.create_global_mesh((4, 2), ('x', 'y')) np_inp = np.arange(16).reshape(8, 2) s = jax.sharding.NamedSharding(mesh, P('x', 'y')) arr = jax.device_put(np_inp, s) @partial(jax.jit, out_shardings=s) def f(x): return x * 2 print(f.lower(arr).as_text()) ``` outputs: ``` module @jit_f attributes {mhlo.num_partitions = 8 : i32, mhlo.num_replicas = 1 : i32} { sdy.mesh @mesh = <"x"=4, "y"=2> func.func public @main(%arg0: tensor<8x2xi64> {mhlo.layout_mode = "{1,0}", sdy.sharding = #sdy.sharding<@mesh, [{"x"}, {"y"}]>}) -> (tensor<8x2xi64> {jax.result_info = "", mhlo.layout_mode = "default", sdy.sharding = #sdy.sharding<@mesh, [{"x"}, {"y"}]>}) { %c = stablehlo.constant dense<2> : tensor<i64> %0 = stablehlo.broadcast_in_dim %c, dims = [] : (tensor<i64>) -> tensor<8x2xi64> %1 = stablehlo.multiply %arg0, %0 : tensor<8x2xi64> return %1 : tensor<8x2xi64> } } ``` Shardy will be hidden behind the `jax_use_shardy_partitioner` flag initially before becoming enabled by default in the future. PiperOrigin-RevId: 655127611	2024-07-23 05:32:06 -07:00
Dan Foreman-Mackey	33a9db3943	Move FFI helper macros from jaxlib/cpu/lapack_kernels.cc to a jaxlib/ffi_helpers.h. Some of the macros that were used in jaxlib's FFI calls to LAPACK turned out to be useful for other FFI calls. This change consolidates these macros in the ffi_helper header. PiperOrigin-RevId: 651166306	2024-07-10 15:09:45 -07:00
Dan Foreman-Mackey	4f394828e1	Fix C++ registration of FFI handlers and consolidate gpu/linalg kernel implementation. This change does a few things (arguably too many): 1. The key change here is that it fixes the handler registration in `jaxlib/gpu/gpu_kernels.cc` for the two handlers that use the XLA FFI API. A previous attempt at this change caused downstream issues because of duplicate registrations, but we were able to fix that directly in XLA. 2. A second related change is to declare and define the XLA FFI handlers consistently using the `XLA_FFI_DECLARE_HANDLER_SYMBOL` and `XLA_FFI_DEFINE_HANDLER_SYMBOL` macros. We need to use these macros instead of the `XLA_FFI_DEFINE_HANDLER` version which produces a lambda, so that when XLA checks the address of the handler during registration it is consistent. Without this change, the downstream tests would continue to fail. 3. The final change is to consolidate the `cholesky_update_kernel` and `lu_pivot_kernels` implementations into a common `linalg_kernels` target. This makes the implementation of the `_linalg` nanobind module consistent with the other targets within `jaxlib/gpu`, and (I think!) makes the details easier to follow. This last change is less urgent, but it was what I set out to do so that's why I'm suggesting them all together, but I can split this in two if that would be preferred. PiperOrigin-RevId: 651107659	2024-07-10 12:09:12 -07:00
jax authors	96cf5d53c8	Merge pull request #21916 from ROCm:ci_pjrt PiperOrigin-RevId: 646793145	2024-06-26 02:43:21 -07:00
Kyle Lucke	ebdafea9c8	Stop using xla/status.h, xla:status, and xla::Status now that xla::Status is just an alias for an absl::Status PiperOrigin-RevId: 644063768	2024-06-17 10:51:55 -07:00
Ruturaj4	99c2b7b4e9	[ROCm] Bring-up pjrt support	2024-06-17 16:49:22 +00:00
Dan Foreman-Mackey	0bf6700e3f	Expose XLA FFI headers to bazel build and re-enable tests This re-enables the tests removed in https://github.com/google/jax/pull/21563 and adds support for exposing the XLA FFI headers in the `jax.extend.ffi.include_dir` directory during a bazel build. While it's unlikely that these will be useful for most bazel users, it is good to provide a consistent interface with the wheel build and to be able to test this feature. PiperOrigin-RevId: 640194961	2024-06-04 10:14:43 -07:00
Adam Paszke	cfe64cd5ce	[Mosaic GPU] Integrate the ExecutionEngine with the jaxlib GPU plugin This lets us avoid bundling a whole another copy of LLVM with JAX packages and so we can finally start building Mosaic GPU by default. PiperOrigin-RevId: 638569750	2024-05-30 01:46:23 -07:00
George Necula	3bcb8d6831	Remove DUCC FFT from jaxlib JAX has stopped generating code that uses directly the DUCC FFT custom calls. The 6 months backwards compatibility window has also expired. PiperOrigin-RevId: 638132572	2024-05-28 21:12:23 -07:00
Eugene Zhulenev	d5c7ccc774	[xla:python] Add support for registering custom call targets for all XLA execution stages and for XLA FFI traits PiperOrigin-RevId: 636963591	2024-05-24 10:34:46 -07:00
Kyle Lucke	418b68828a	Automated Code Change PiperOrigin-RevId: 635818645	2024-05-21 08:40:34 -07:00
jax authors	c3cab2e3d3	Reverts 6c425338d20c0c9be3fc69d2f07ababf79c881d3 PiperOrigin-RevId: 632579101	2024-05-10 12:56:10 -07:00
Peter Hawkins	6c425338d2	Reverts 0267ed0ba9584bbc137792361b53aa80e9c4d306 PiperOrigin-RevId: 632548226	2024-05-10 11:06:38 -07:00
jax authors	0267ed0ba9	Replace xla_extension symlink with genrule that makes xla_extension module accessible from jax._src.lib. The runfiles of the original targets were lost when the symlinked files were used. This change is needed for future Hermetic CUDA implementation. Bazel will download CUDA distributives in cache, and CUDA executables and libraries will be added in the runfiles of the targets. When `xla_extension` is simlinked, the content of the runfiles is lost. With `genrule` the content of the runfiles is preserved. PiperOrigin-RevId: 632508121	2024-05-10 08:48:12 -07:00
Jieying Luo	a949ce772b	Add get_device_ordinal to cuda plugin so that CUDA dependency can be removed from py_array (jaxlib). py_array still has CUDA dependency as a fallback to keep jaxlib[cuda] working before the migration to CUDA plugin. PiperOrigin-RevId: 629499893	2024-04-30 12:50:50 -07:00
Adam Paszke	8e3f5b1018	Initial commit for Mosaic GPU Moving this to JAX to make it easier to explore Pallas integration. PiperOrigin-RevId: 625982382	2024-04-18 04:04:10 -07:00
David Dunleavy	aade591fdf	Move `tsl/python` to `xla/tsl/python` PiperOrigin-RevId: 620320903	2024-03-29 13:15:21 -07:00
Peter Hawkins	1a193ea189	Fix segfault in cuda_plugin_extension. The nanobind switch for the GPU callback code means that we are now using the NumPy APIs rather than pybind11's clone of them. It is important to initialize the NumPy APIs before using them in each module. PiperOrigin-RevId: 613036056	2024-03-05 18:31:50 -08:00
Peter Hawkins	feda85dff3	Replace references to xla/python/status_casters.h with xla/pjrt/status_casters.h, which its current home. PiperOrigin-RevId: 612578488	2024-03-04 14:11:01 -08:00
David Dunleavy	be3e39ad3b	Move `tsl/cuda` to `xla/tsl/cuda` PiperOrigin-RevId: 610550833	2024-02-26 15:45:10 -08:00
Sergei Lebedev	881436240e	Inlined triton.compat We no longer need a compatibility layer, since Pallas does not use any Triton IR building APIs. PiperOrigin-RevId: 606948415	2024-02-14 05:23:15 -08:00
Sergei Lebedev	1e9f96a574	Include Triton files into the jaxlib wheel This PR is based on #19368.	2024-01-16 15:28:12 +00:00
Peter Hawkins	858fd52ac0	Fix jaxlib wheel build after removal of mosaic python files. PiperOrigin-RevId: 597536465	2024-01-11 06:21:07 -08:00
Tomás Longeri	027c24e602	[Mosaic] Remove Python implementation of apply_vector_layout and infer_memref_layout. PiperOrigin-RevId: 597332393	2024-01-10 13:00:21 -08:00
Peter Hawkins	32fb1b4034	Remove the ml_program MLIR dialect from jaxlib. Jax isn't using this, and in fact our code to build this wasn't including the C++ parts, so it was broken anyway. Remove it until someone actually needs it for something. PiperOrigin-RevId: 587323808	2023-12-02 09:29:39 -08:00
Jieying Luo	0e24b90043	[PJRT C API] Register custom callback for `xla_python_gpu_callback` in plugin module. PiperOrigin-RevId: 568671822	2023-09-26 15:54:10 -07:00
Jieying Luo	c7f60fa6eb	[PJRT C API] Implement framework side change for registering a custom call. - Add a py extension to call the custom call C API. - Change the implementation of register_custom_call_target to store handlers for the custom call targets and delays the registration until the handler for a xla platform is registered. - Change register_plugin to load PJRT plugin when register_pluin is called (instead of when a client is created), and let it return the PJRT_Api* loaded. - Delay calling discover_pjrt_plugins() and register_pjrt_plugin_factories_from_env() until the first time backends() is called. PiperOrigin-RevId: 568265745	2023-09-25 10:52:29 -07:00
John QiangZhang	997b35e1d9	Improve the gpu lowering error message if users forget link the gpu library. PiperOrigin-RevId: 564530960	2023-09-11 16:14:18 -07:00
Peter Hawkins	88408e13ee	Remove stale references to //jaxlib:setup.cfg in Bazel build. Fixes broken jaxlib wheel build.	2023-09-03 19:18:25 +00:00
Peter Hawkins	70b7d50181	Switch jaxlib to use nanobind instead of pybind11. nanobind has a number of advantages (https://nanobind.readthedocs.io/en/latest/why.html), notably speed of compilation and dispatch, but the main reason to do this for these bindings is because nanobind can target the Python Stable ABI starting with Python 3.12. This means that we will not need to ship per-Python version CUDA plugins starting with Python 3.12. PiperOrigin-RevId: 559898790	2023-08-24 16:07:56 -07:00
Sharad Vikram	3baa6e7a89	Enable building jaxlib w/ Mosaic PiperOrigin-RevId: 551159246	2023-07-26 03:59:30 -07:00
Sharad Vikram	3d556b7a19	Add Mosaic to Jaxlib and expose bindings in `jax.experimental.mosaic` PiperOrigin-RevId: 549801858	2023-07-20 18:28:51 -07:00
Peter Hawkins	f7eef2eda8	Use the upstream MLIR strip-debuginfo pass instead of hand-rolling our own. (I had missed that the upstream pass exists!) Fixes https://github.com/google/jax/issues/16649 PiperOrigin-RevId: 548192839	2023-07-14 12:24:59 -07:00
Sharad Vikram	bf8ed6a543	Move triton_kernel_call_lib to jaxlib PiperOrigin-RevId: 534934592	2023-05-24 12:11:21 -07:00

1 2 3 4

153 Commits