rocm_jax

mirror of https://github.com/ROCm/jax.git synced 2025-04-16 11:56:07 +00:00

History

Yash Katariya 928dee415f Optimize host_local_array_to_global_array by caching the local to global conversion and flattening of axis resources. Also take a fast path for device_put which does not do abstractify and only canonicalize_dtype on the entire array once (instead of doing it for every shard).

This results in a 5x speedup!

Before:

```
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
host_local_array_to_global_array       3.03 ms         3.02 ms          220
```

After:

```
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
host_local_array_to_global_array      0.673 ms        0.671 ms          985
```

PiperOrigin-RevId: 489880547

2022-11-20 20:53:02 -08:00

api_benchmark.py

Optimize host_local_array_to_global_array by caching the local to global conversion and flattening of axis resources. Also take a fast path for device_put which does not do abstractify and only canonicalize_dtype on the entire array once (instead of doing it for every shard).

2022-11-20 20:53:02 -08:00

benchmark.py

Change JAX's copyright attribution from "Google LLC" to "The JAX Authors.".

2022-09-22 12:27:19 -07:00

gda_benchmark.py

Make MeshPspecSharding an alias for NamedSharding (it was the other way around before this CL).

2022-11-14 14:44:00 -08:00

pmap_benchmark.py

Replace jax.xla.DeviceArray private type with the new public type jax.Array.

2022-09-28 16:34:10 -07:00