mirror of
https://github.com/ROCm/jax.git
synced 2025-04-16 03:46:06 +00:00
[Mosaic GPU] Make sure to do the async proxy fence before wargroup sync
This is the ordering we want for a proper release of generic SMEM stores into the async proxy. The old order was problematic: once the warpgroup barrier was complete, some warps could get deselected before they get to the fence. For as long as the first warp would make progress, it could go through the fence along and start issuing TMA copies before other warps have synchronized with the async proxy. I have not observed this problem in any of our kernels so far, but this order seems safer to me. PiperOrigin-RevId: 733333814
This commit is contained in:
parent
155839bb4d
commit
cdae5fcfc7
@ -670,10 +670,10 @@ def parse_indices(
|
||||
|
||||
|
||||
def commit_shared():
|
||||
warpgroup_barrier()
|
||||
nvvm.fence_proxy(
|
||||
nvvm.ProxyKind.async_shared, space=nvvm.SharedSpace.shared_cta
|
||||
)
|
||||
warpgroup_barrier()
|
||||
|
||||
|
||||
def warpgroup_barrier():
|
||||
|
Loading…
x
Reference in New Issue
Block a user