Adam Paszke ded9272a5b [Mosaic GPU] Implement a simple profilng tool using CUDA events
The other JAX profiling tools are a little heavyweight when we only care about
timing a single kernel programatically.

Also adapt wgmma.py to match failures triggered by upstream MLIR changes.

PiperOrigin-RevId: 628096973
2024-04-25 09:18:39 -07:00
..