rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-22 09:36:06 +00:00

Author	SHA1	Message	Date
Ruturaj Vaidya	1fc3d15727	Use HIPBLAS_V2 (#222 ) Co-authored-by: Harsha HS <Harsha.HavanurShamsundara@amd.com>	2025-02-05 18:12:13 -06:00
jax authors	a527aba646	Reverts f1b894d14a28ac22a037fb79177b991275c75a18 PiperOrigin-RevId: 716653711	2025-01-17 07:00:31 -08:00
Benjamin Chetioui	d3be190efb	[Mosaic GPU] Delete unused declarations of `mosaic_gpu_memcpy_async_h2d`. PiperOrigin-RevId: 716616807	2025-01-17 04:34:48 -08:00
Sergei Lebedev	d34c40f6b6	[mosaic_gpu] Added a serialization pass The pass adds versioning to the Mosaic GPU IR in the lowered custom calls and can apply forward/backward migration rules. Currently, no rules are necessary since we are at version 1. PiperOrigin-RevId: 716596848	2025-01-17 03:12:51 -08:00
Adam Paszke	bd22bfef71	[Mosaic TPU] Use large to compact 2nd minor retiling for conversions going both ways This specific retiling is its own inverse and it faster than alternatives. PiperOrigin-RevId: 716360070	2025-01-16 13:35:26 -08:00
Tzu-Wei Sung	5c020ee317	[Mosaic] Fix infer/apply extensions. 1. For apply, llvm::StringMap()::insert(MapEntryTy*) will cause dangling reference if not constructing mlir::tpu::extensions::rules() with const-reference. However, if we do construct it with const-reference, the signature is not const-qualified and fails to compile. Hence, change it to llvm::StringMap()::insert(std::pair<...>) and get extension rules by const-reference. 2. Pass default tiling to infer rule, we need it to infer single op. See infer of tpu::MatmulOp. PiperOrigin-RevId: 716274818	2025-01-16 09:57:14 -08:00
Sergei Lebedev	4221f109d1	[mosaic] Extracted serialization pass traversal logic into a reusable function I will use it to implement Mosaic GPU serialization pass in a follow up. PiperOrigin-RevId: 716156650	2025-01-16 02:58:06 -08:00
Tzu-Wei Sung	4a9cc9ffc1	[Mosaic] Allow passing `ApplyVectorLayoutCtx` to tpu.apply_layout_op. To make it the same with C++ API. While I'm here, fix a bug in test_concatenate. PiperOrigin-RevId: 716016244	2025-01-15 17:47:36 -08:00
Naums Mogers	d3ba1eb339	[Mosaic] Add a macro to convert abseil StatusOr to LLVM FailureOr PiperOrigin-RevId: 715943314	2025-01-15 14:19:29 -08:00
jax authors	41993fdb24	Merge pull request #25755 from ROCm:ci_rnn_final-upstream PiperOrigin-RevId: 715856939	2025-01-15 10:40:54 -08:00
Ruturaj4	fe68eb8b25	[ROCm] Implement RNN support	2025-01-14 19:04:49 -06:00
George Necula	f1b894d14a	Reverts 391bad8ff59c07c8fad7b8ce05cd0e29dee4cf1a PiperOrigin-RevId: 715435319	2025-01-14 10:31:59 -08:00
Ayaka	9ba1fd2801	[Pallas TPU] Add vector support to `pl.debug_print` PiperOrigin-RevId: 715085454	2025-01-13 13:22:21 -08:00
Adam Paszke	391bad8ff5	[Mosaic TPU] Add support for arith.fptosi with non-32bit source and target types This effectively moves some of the Pallas logic to the layer below. PiperOrigin-RevId: 714965374	2025-01-13 07:49:13 -08:00
Peter Hawkins	91ffb640a8	Use thread-safe initialization of LAPACK kernels. Use absl::call_once instead of a GIL-protected global initialization. In passing, also remove an unused function. PiperOrigin-RevId: 714892175	2025-01-13 02:51:38 -08:00
Tomás Longeri	7852045582	[Mosaic TPU] Enable non-sublane-aligned bf16 2D load/stores for earlier TPU gens It is still not efficiently implemented, this is mostly to clean up some logic. We may be able to fuse the creation of masks for different tiles into the creation of a single one. But this is also a problem for the later gens. This also cleans up an unreachable return statement. PiperOrigin-RevId: 714847066	2025-01-12 23:58:40 -08:00
Tomás Longeri	0930289997	[Mosaic TPU][NFC] Remove redundant num_subelems attribute from CreateSubelementMaskOp PiperOrigin-RevId: 714795856	2025-01-12 19:34:25 -08:00
jax authors	a16fbffc13	[Mosaic][TPU] Add a compatibility mode to Mosaic's canonicalization pass, skipping over elementwise and matmul op insertions and/or type compat casts. PiperOrigin-RevId: 714132282	2025-01-10 12:12:54 -08:00
Dan Foreman-Mackey	39ce7916f1	Activate FFI implementation of tridiagonal reduction on GPU. PiperOrigin-RevId: 714078036	2025-01-10 09:28:15 -08:00
Dan Foreman-Mackey	c1de7c733d	Add LAPACK lowering for lax.linalg.tridiagonal_solve on CPU. In implementing https://github.com/jax-ml/jax/pull/25787, I realized that while we lower `tridiagonal_solve` to cuSPARSE on GPU, we were using an explicit implementation of the Thomas algorithm on CPU. We should instead lower to LAPACK's `gtsv` on CPU because it should be more numerically stable and faster. PiperOrigin-RevId: 714069225	2025-01-10 08:56:46 -08:00
jax authors	564b6b0d72	Merge pull request #20282 from tttc3:pivoted-qr PiperOrigin-RevId: 714053620	2025-01-10 08:02:02 -08:00
Adam Paszke	d2a5e8d072	[Mosaic TPU] Add support for integer truncation from packed types PiperOrigin-RevId: 714048232	2025-01-10 07:40:55 -08:00
jax authors	061408aca3	Merge pull request #25803 from sergachev:fix_rnn_desc PiperOrigin-RevId: 713789106	2025-01-09 14:05:30 -08:00
tttc3	c89be05b5b	Enable pivoted QR on CPU devices. A pivoted QR factorization is possible in `scipy.linalg.qr`, thanks to the `geqp3` routine of LAPACK. To provide the same functionality in JAX, we implement a new primitive `geqp3_p` which calls the LAPACK routine via the FFI on CPU devices. Both `jax.scipy.linalg.qr` and `jax.lax.linalg.qr` now support the use of column-pivoting on CPU devices. To provide a GPU implementation of `geqp3` may require using MAGMA, due to the lack of a `geqp3` implementation in `cuSolver` - see ccb331707e80b16d89de6e5c9f2f89b87c1682ed (`jax.lax.linalg.eig`) for an example of using MAGMA in GPU lowerings. Such a GPU implementation can be considered in the future.	2025-01-09 20:44:45 +00:00
Adam Paszke	07f4fd3e51	[Mosaic TPU] Fix a bug in the impl of sublane broadcasts for int8 and int4 PiperOrigin-RevId: 713675029	2025-01-09 08:05:25 -08:00
Ilia Sergachev	f0e1c3cf36	Fix struct string encoding non-determinism in the RNN descriptor. Boolean fields in the descriptor struct led to padding, which let random bytes in the string representation of the struct and variance in HLO from run to run.	2025-01-09 12:57:09 +00:00
Peter Hawkins	0389d617c8	Add a unittest test extension that runs test cases in parallel using threads. This change does not yet do the work necessary to make any tests pass with threading enabled, which will come in future changes. This approach is broadly inspired by `a6d205dd4c/testtools/testsuite.py (L113)` and by unittest-ft. We add a custom TestResult class that batches up any test result actions and applies them under a lock. We also add a custom TestSuite class that runs individual test cases in parallel using a thread-pool. We need a reader-writer lock to implement a `@jtu.thread_hostile_test` decorator, which we do by adding bindings around absl::Mutex to jaxlib. PiperOrigin-RevId: 713312937	2025-01-08 09:11:47 -08:00
Adam Paszke	f96339be1e	[Mosaic TPU] Be much more aggressive in inferring large 2nd minor layouts for 16-bit types on v6 This often lets us avoid ambiguities between selecting the (8, 128) and (16, 128) tiling, by biasing the layout inference to prefer the latter. PiperOrigin-RevId: 713270421	2025-01-08 06:30:36 -08:00
Adam Paszke	5fd1b2f825	[Mosaic TPU] Add support for second minor broadcasts with packed types PiperOrigin-RevId: 713259707	2025-01-08 05:45:02 -08:00
Adam Paszke	e954930eaf	[Mosaic TPU] Add support for true divide in bf16 on TPUv6 PiperOrigin-RevId: 713247480	2025-01-08 04:49:22 -08:00
Tzu-Wei Sung	bf94389b08	[Mosaic] Use tpu::CreateMask for getX32VmaskByPaddingEnd. It was cmp + iota before. PiperOrigin-RevId: 713240888	2025-01-08 04:18:53 -08:00
Peter Hawkins	392a851769	Increase the minimum SciPy version to 1.11.1. (1.11.0 was yanked from PyPi because of licensing problems, so 1.11.1 is the oldest 1.11 release.) PiperOrigin-RevId: 713073731	2025-01-07 16:10:45 -08:00
Dan Foreman-Mackey	a7f384cc6e	Add a register_custom_type_id function to the GPU plugins. This enables dynamic registration of custom FFI types on the appropriate platform via PJRT. PiperOrigin-RevId: 712904085	2025-01-07 07:29:38 -08:00
Sharad Vikram	4caa263a94	[Mosaic TPU] Add some elementwise canonicalizations PiperOrigin-RevId: 712671502	2025-01-06 15:10:02 -08:00
Peter Hawkins	90d8f37863	Rename pybind_extension to nanobind_extension. We have no remaining uses of pybind11 outside a GPU custom call example. PiperOrigin-RevId: 712608834	2025-01-06 11:53:44 -08:00
Peter Hawkins	61dd041225	Suppress MSAN warnings from SVD that are showing up in CI. In our MSAN CI, the copy of LAPACK we use is not MSAN-instrumented, leading to false positives. Suppress those false-positives via annotations. PiperOrigin-RevId: 712607044	2025-01-06 11:49:05 -08:00
Jevin Jiang	9f842909ce	[Mosaic TPU] Validate inserted layout in relayout-insertion pass. PiperOrigin-RevId: 712595778	2025-01-06 11:15:47 -08:00
John QiangZhang	c39e38fe5a	bazel: export serialization.fbs for downstream usage PiperOrigin-RevId: 712587802	2025-01-06 10:57:35 -08:00
Tzu-Wei Sung	57b21541a2	[Mosaic] NFC: Pull out vreg related functions to util. These functions are related to vreg manipulation and are used in different rules. PiperOrigin-RevId: 711484002	2025-01-02 11:50:19 -08:00
jax authors	68483b8ed6	Merge pull request #25710 from apaszke:mgpu_dialect_fix PiperOrigin-RevId: 711430610	2025-01-02 08:23:28 -08:00
Adam Paszke	64433435ff	Fix OSS build for the Mosaic GPU dialect	2025-01-02 15:55:03 +00:00
Tomás Longeri	ac817b48ca	[Mosaic:TPU][NFC] Clean up unused variable PiperOrigin-RevId: 711412888	2025-01-02 06:57:38 -08:00
Tomás Longeri	4452960947	[Mosaic:TPU] In infer ext rule, avoid assigning offsets outside of dst first tile Note that offsets outside of first tile are still disabled (for both infer and apply), and once we support it we will want to assign offsets differently, this is mostly to avoid assigning invalid layouts (that may not just be outside the first tile, but outside the vreg slice) PiperOrigin-RevId: 709168368	2024-12-23 15:49:39 -08:00
jax authors	b8091a437a	Switch `mlir` bindings from `pybind11` to `nanobind` PiperOrigin-RevId: 709161113	2024-12-23 15:10:11 -08:00
Tomás Longeri	3c79b98cd9	[Mosaic:TPU] Vreg-slice-aligned offset changes with scratch retiling PiperOrigin-RevId: 709133729	2024-12-23 13:05:14 -08:00
Sergei Lebedev	68ec202d45	Use the right include for gmock and gtest PiperOrigin-RevId: 709058082	2024-12-23 07:34:36 -08:00
Sergei Lebedev	8987867faa	[mosaic_gpu] Include Mosaic GPU dialect fiels into jaxlib	2024-12-23 13:46:25 +00:00
Tomás Longeri	7ecc947184	[Mosaic:TPU] Roll forward of cl/708011538 (expanded trunc support), minus changes in infer-vector-layout We can enable them later but at least this way the support is available to build on (e.g. in the new insert relayouts pass) Reverts 05f3a701e769748ff1ec51d50324a3595c4aff0d PiperOrigin-RevId: 708397219	2024-12-20 12:33:30 -08:00
Peter Hawkins	0ff3f144e5	Migrate _mlir Python binding target to nanobind. PiperOrigin-RevId: 708390390	2024-12-20 12:07:29 -08:00
Tomás Longeri	05f3a701e7	[Mosaic:TPU] Roll back cl/708011538 and cl/708112341 Reverts 307c8d3af81f16142fd4c64f501b05a5b69f815e PiperOrigin-RevId: 708173083	2024-12-19 21:51:44 -08:00

1 2 3 4 5 ...

1281 Commits