rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-19 05:16:06 +00:00

Author	SHA1	Message	Date
Gunhyun Park	d191927b24	Fix syntax error and typos for composite primitive docstring. PiperOrigin-RevId: 735808000	2025-03-11 10:37:07 -07:00
Jake VanderPlas	4ae3211ea2	jax.disable_jit: ensure while_loop behaves similarly to non-disable_jit version	2025-03-11 09:53:34 -07:00
Adam Paszke	30a9e1b3bf	[Mosaic GPU] Add support for .cta_group::2 MMA with n=512 on Blackwell This one is particularly annoying, because we have to break up the MMA into two collective N=256 MMAs. However, TensorCore only updates a contiguous chunk of columns in TMEM and so after executing two of those we end up with a TMEM layout that looks like this: ``` Contributing CTA \| 0 \| 1 \| 0 \| 1 \| N local \| 0:128 \| 0:128 \| 128:256 \| 128:256 \| N \| 0:128 \| 256:384 \| 128:256 \| 384:512 \| ``` You can see that the TMEM columns no longer monotonically go over all columns until N=512, but they include a number of jumps! We could fix this on the load side, by ensuring that each CTA in the group does a strided load along the tiled dimension, but that just seems more trouble than it's worth (and is not that well supported by TMA unless we increase the number of striding levels). Instead, we encode this weirdness in the TMEM layout we use and make sure to rearrange the data properly while loading the tiles into registers. PiperOrigin-RevId: 735791426	2025-03-11 09:53:20 -07:00
jax authors	1aca76fc13	Update `:build_jaxlib` flag to control whether we should add `py_import` dependencies to the test targets. This change enables testing the wheels produced by the build rules in the presubmit using one `bazel test` command only. There are three options for running the tests: 1) `build_jaxlib=true`: the tests depend on JAX targets. 2) `build_jaxlib=false`: the tests depend on the wheel files located in the `dist` folder. 3) `build_jaxlib=wheel`: the tests depend on the py_import targets. PiperOrigin-RevId: 735765819	2025-03-11 08:31:43 -07:00
Yash Katariya	76dec38286	Under pjit the `with mesh:` context will use `use_mesh(mesh): jit` instead of tracking separately using `resource_env`. This would also make it easier to deprecate the `with mesh: pjit` path in the future from user code since the new path would be completely tested. This will also allow us to remove `resource_env` from JAX and the internal API access of `resource_env.physical_mesh` spread throughout codebases internally and externally. PiperOrigin-RevId: 735602187	2025-03-10 20:21:02 -07:00
jax authors	02505fa757	[Pallas TPU] Remove `next_slot` SMEM tensor from pipeline emitter PiperOrigin-RevId: 735564365	2025-03-10 17:19:39 -07:00
Ayaka	988a1208a9	Better error message when `raise_if_error()` is called within a traced context PiperOrigin-RevId: 735557928	2025-03-10 16:55:06 -07:00
jax authors	aceae84fab	[Pallas] Enable skipping of floating-point operations when interpreting Pallas TPU kernels on CPU. PiperOrigin-RevId: 735527650	2025-03-10 15:14:00 -07:00
Sharad Vikram	81dde225b0	[Pallas/Fuser] Add select_n push rule PiperOrigin-RevId: 735510713	2025-03-10 14:23:01 -07:00
jax authors	261e6e5fdc	Merge pull request #27038 from jakevdp:vmap-sentinel PiperOrigin-RevId: 735510065	2025-03-10 14:21:11 -07:00
jax authors	c942b0fef0	Merge pull request #26977 from jakevdp:fix-expn PiperOrigin-RevId: 735506133	2025-03-10 14:09:32 -07:00
Sharad Vikram	87272fbe93	[Pallas/Fuser] Add debug option to fuser.fuse that prints out jaxpr PiperOrigin-RevId: 735505460	2025-03-10 14:07:26 -07:00
carlosgmartin	8b6ca56417	Fix the ValueError message for random.binomial (forgot to use string formatting).	2025-03-10 16:38:03 -04:00
jax authors	affe2e734e	Rename `dot_with_no_batch_dims_saveable` to `dots_with_no_batch_dims_saveable` for internal consistency PiperOrigin-RevId: 735484326	2025-03-10 13:04:49 -07:00
Praveen Narayanan	b6d4fe5387	Define lax.ragged_dot_general and express lax.ragged_dot in terms of it. PiperOrigin-RevId: 735471245	2025-03-10 12:25:22 -07:00
jax authors	18f2f19c1a	Merge pull request #26525 from wenscarl:e2m1fn PiperOrigin-RevId: 735457804	2025-03-10 11:46:18 -07:00
Jacob Burnim	73d20cd62a	[Pallas] Small fix to TPU interpret mode (input_output_aliases + scalar args). PiperOrigin-RevId: 735455671	2025-03-10 11:40:10 -07:00
Jake VanderPlas	8ecadfdf9d	Internal: make it easier to detect the vmap sentinel	2025-03-10 11:37:50 -07:00
Michael Whittaker	5cb29949d4	Warn the user if transparent huge pages aren't enabled. PiperOrigin-RevId: 735431881	2025-03-10 10:37:58 -07:00
jax authors	14b215fe76	Merge pull request #27032 from dfm:lax-dtype PiperOrigin-RevId: 735424674	2025-03-10 10:18:58 -07:00
jax authors	ab0ce8a448	Merge pull request #26811 from dfm:direct-lin PiperOrigin-RevId: 735388827	2025-03-10 08:39:49 -07:00
Dimitar (Mitko) Asenov	d2bf034c47	[Mosaic GPU] Test the wgmma_op lowering when a is in registers. I had to add support for wgmma layout in vector_load. Not sure if this is useful outside the test. PiperOrigin-RevId: 735384104	2025-03-10 08:25:43 -07:00
Dan Foreman-Mackey	21884d4a14	Move (most) jaxlib linalg custom call registration into JAX. My motivation here is to fix the plugin support for batch partitionable custom calls. Since plugin support for custom call partitioners is provided via register_plugin_callback in xla_bridge, instead of xla_client itself, it's much more straightforward to register the custom calls in JAX. It would be possible to refactor things differently, but it actually seems like a reasonable choice to use the supported APIs from `jax.ffi` instead of `xla_client` so that we can take advantage of any new features we might add there in the future. This is all still a little bit brittle and I'd eventually like to migrate to a version where the XLA FFI library provides a mechanism for exporting handlers, but this change is still compatible with any future changes like that. PiperOrigin-RevId: 735381736	2025-03-10 08:17:44 -07:00
Dan Foreman-Mackey	4eada56027	Avoid using array operations within lax.py operations.	2025-03-10 11:04:32 -04:00
Sergei Lebedev	91340ea0a7	[pallas:mosaic_gpu] Added support for math functions to the WG lowering PiperOrigin-RevId: 735333893	2025-03-10 05:08:19 -07:00
Benjamin Chetioui	75d8702023	[Pallas/Mosaic GPU] Add lowerings/layout inference for all the necessary conversion ops when using Warpgroup semantics. Enable some of the pre-existing Pallas `ops_test`s for testing. PiperOrigin-RevId: 735293084	2025-03-10 02:14:39 -07:00
Dan Foreman-Mackey	36d515ed2c	A few more fixes for debug_info tests with direct_linearize.	2025-03-08 07:47:24 -05:00
Jevin Jiang	0f0636afab	[Mosaic TPU][Pallas] Add pl.reciprocal PiperOrigin-RevId: 734749577	2025-03-07 18:29:30 -08:00
jax authors	4988adccf1	Merge pull request #27010 from mattjj:direct-linearize-fixes-3 PiperOrigin-RevId: 734747001	2025-03-07 18:15:02 -08:00
Matthew Johnson	fe26c19b92	[direct-linearize] fix name_stack bugs Surprisingly, the bug was tracked down to #26111 aka cl/730939406, specifically the new implementation of reset_name_stack in source_info_util.py. To repro, use the before-this-commit implementation of reset_name_stack (left commented-out in the file), and run ``` JAX_USE_DIRECT_LINEARIZE=1 python tests/name_stack_test.py NameStackTransformationTest.test_nested_jit_stack ```	2025-03-08 01:51:19 +00:00
Matthew Johnson	251b93ebd7	fixups that we meant to include in #26427 Co-authored-by: Dougal Maclaurin <dougalm@google.com>	2025-03-08 00:03:26 +00:00
Jevin Jiang	041f575747	Support MHA in ragged paged attention for packed type PiperOrigin-RevId: 734695213	2025-03-07 14:47:04 -08:00
jax authors	6095af050f	Merge pull request #26427 from mattjj:direct-linearize-fixes PiperOrigin-RevId: 734687601	2025-03-07 14:22:16 -08:00
jax authors	d849779689	Merge pull request #27001 from mattjj:yash-scan PiperOrigin-RevId: 734685031	2025-03-07 14:14:30 -08:00
jax authors	1870176eb3	Merge pull request #26979 from mattjj:26936 PiperOrigin-RevId: 734674945	2025-03-07 13:43:55 -08:00
Matthew Johnson	f4f31f89ae	[scan] when num_trips==0, don't generate weird size-zero reshapes	2025-03-07 21:35:40 +00:00
Matthew Johnson	7c2f842353	shard_map and other fixes to direct-linearize Co-authored-by: Dougal Maclaurin <dougalm@google.com>	2025-03-07 21:02:40 +00:00
Matthew Johnson	0e30a3ace9	[mutable-arrays] read values should have the same explicit sharding as ref fixes #26936	2025-03-07 20:53:29 +00:00
jax authors	ccf7278292	Add the len(arg) to the error message for static_argnums Helps reduce the confusion on what is considered an argnum. Ideally there should be static_argkwg PiperOrigin-RevId: 734591856	2025-03-07 09:49:49 -08:00
Yash Katariya	9f37b5197f	[sharding_in_types] Fix a bug where `empty_array` in scan was created with the wrong spec when `unroll > 1`. PiperOrigin-RevId: 734591110	2025-03-07 09:47:32 -08:00
Christos Perivolaropoulos	eeccc67c0b	[mgpu] Debug print arrays. PiperOrigin-RevId: 734576543	2025-03-07 08:58:25 -08:00
Adam Paszke	1bef8b61af	[Mosaic GPU] Add a better explanation for the transposed layout Thanks to @bchetioui for the discussion! PiperOrigin-RevId: 734564672	2025-03-07 08:19:32 -08:00
Sergei Lebedev	928caf83ee	[pallas:mosaic_gpu] `copy_smem_to_gmem` now allows skipping `cp.async.commit_group` This feature is necessary to fix the SMEM->GMEM waiting behavior in `emit_pipeline`, which used a pessimistic condition prior to this change, since every copy was its own commit group. PiperOrigin-RevId: 734553668	2025-03-07 07:43:54 -08:00
Adam Paszke	65462fe684	[Mosaic GPU] Add a new layout to help with transposing WGMMA results PiperOrigin-RevId: 734553651	2025-03-07 07:42:01 -08:00
Yash Katariya	f8b98993b8	Add a divisibility check so that we make sure that sharding evenly divides the shape (until this restriction is lifted) to make sure we don't create bad shardings. Also improve dynamic_update_slice sharding error by printing `aval.str_short()` instead of full sharding because it's concise and gives more info than the current error (i.e. it adds shape too to the error message) Also make some formatting changes in scan lowering to make it easier to debug. PiperOrigin-RevId: 734542862	2025-03-07 07:01:34 -08:00
Dan Foreman-Mackey	b7ecfdfd95	Update ad.backward_pass to support non-linear functions of constants.	2025-03-07 09:54:06 -05:00
Adam Paszke	85c6b6a128	[Mosaic GPU] Add support for tiling stores to refs using small tiling The difficulty here is that our register tiling is based on the (64, 8) shape, while the memory tiling is now (8, swizzle // bytewidth). Before, we would assume that each register tile fits neatly within a single memory tile, but now it is obviously not the case. Luckily, it wasn't too hard to add. PiperOrigin-RevId: 734517000	2025-03-07 05:19:11 -08:00
jax authors	de78d2cc71	Merge pull request #26950 from lockwo:Owen/add-pmap-typehint PiperOrigin-RevId: 734500798	2025-03-07 04:10:35 -08:00
Daniel Suo	e6db7a9d99	Dedup non-ref constants closed in cond branch functions. PiperOrigin-RevId: 734497907	2025-03-07 04:01:42 -08:00
shuw	ccbe9f7cd6	Fix lint	2025-03-07 04:52:58 +00:00

1 2 3 4 5 ...

16367 Commits