mirror of
https://github.com/ROCm/jax.git
synced 2025-04-22 22:06:05 +00:00

popcount and clz were effectively broken on ROCm, since math_dialect had incorrect lowerings. Use the device intrinsics for these functions, as well as for exp and absf, which fixes some accuracy issues in the pallas tests. Docs for OCML/OCKL - https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/doc/OCML.md - https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/doc/OCKL.md