Yash Katariya
fd09b35645
Optimize make_array_from_callback for fully replicated shardings by going via batched_device_put
...
Before:
```
name cpu/op
bench_make_array_from_callback_fully_replicated_sharding 467µs ± 3%
name time/op
bench_make_array_from_callback_fully_replicated_sharding 467µs ± 3%
```
After:
```
name cpu/op
bench_make_array_from_callback_fully_replicated_sharding 28.1µs ± 2%
name time/op
bench_make_array_from_callback_fully_replicated_sharding 28.1µs ± 2%
```
PiperOrigin-RevId: 572429822
2023-10-10 19:02:04 -07:00
Patrick Kidger
9d73441ff1
Added serial_dot_products benchmark
2023-09-21 15:25:52 -07:00
Roy Frostig
6abefa1977
fast dispatch for functions over typed PRNG key arrays
...
Before this change, JAX could dispatch compiled functions over new-style (typed)
RNG key arrays, but it would always do so off of the fast (C++-based) dispatch
path. In other words, switching from old-style `uint32` RNG keys to new-style
keys would regress dispatch times. With this change, dispatch happens on the
fast path again and performance regressions ought to be minimal.
We currently maintain only one pytree registry, for all registered pytree node
types. We want RNG key arrays to also be treated as pytree leaves everywhere
*except* during dispatch. In other words: we want operations on (typed) RNG key
arrays to appear in Jaxpr, but we want to unravel those arrays into their
underlying `uint32` arrays only during dispatch.
To do this, we add a new internal pytree registry that dispatch respects
uniquely. This registry includes all items in the default registry, but also the
RNG key array type.
Co-authored-by: Matthew Johnson <mattjj@google.com>
PiperOrigin-RevId: 565077758
2023-09-13 09:43:58 -07:00
Jake VanderPlas
368d3433a6
Add random benchmarks
...
The purpose of this is to measure the difference in dispatch seed between raw keys and new-style typed keys. The latter does not yet hit the C++ fast path, and so we expect it to incur a small additional overhead at dispatch time. Part of #9263 .
PiperOrigin-RevId: 559782442
2023-08-24 09:55:07 -07:00
jax authors
f498442daa
[jax][benchmark] Added clearing caches for benchmarking compilation time in sparse JAX benchmarks
...
PiperOrigin-RevId: 553179605
2023-08-02 10:07:54 -07:00
Peter Hawkins
76cda0ae07
Update flags to use the ABSL typed flag API.
...
Change flags to use the newer definition style where the flag is read via a typed FlagHolder object returned by the DEFINE_... function. The advantage of doing this is that `flag.value` has a type known to the type checker, rather than reading it as an attr out of a gigantic config dictionary.
For jax.config flags, define a typed FlagHolder object that is returned when defining a flag, matching the ABSL API.
Move a number of flags into the file that consumes them. There's no reason we're defining every flag in `config.py`.
This PR does not change the similar "state" objects in `jax.config`. Changing those is for a future PR.
PiperOrigin-RevId: 551604974
2023-07-27 12:15:58 -07:00
Yash Katariya
a6254c75e0
Improve the shape incompatible error message by adding the argument/result name path to it.
...
PiperOrigin-RevId: 529605855
2023-05-04 21:50:04 -07:00
Jake VanderPlas
fbe4f10403
Change to simpler import for jax.config
2023-04-21 11:51:22 -07:00
Jake VanderPlas
5521423d92
Change np.prod->math.prod
...
Why? This is generally used for static operations on shapes, but np.prod
has an unfortunate corner-case behavior that np.prod([]) returns a float.
math.prod is available as of Python 3.8, and is a better solution here.
2023-04-13 11:48:11 -07:00
Peter Hawkins
74384e6a87
Add a C++ safe_zip implementation.
...
Benchmark results on my workstation:
```
name old cpu/op new cpu/op delta
safe_zip/arg_lengths:0/num_args:1 1.22µs ± 1% 0.28µs ± 8% -77.33% (p=0.008 n=5+5)
safe_zip/arg_lengths:1/num_args:1 1.28µs ± 1% 0.34µs ± 6% -73.18% (p=0.008 n=5+5)
safe_zip/arg_lengths:2/num_args:1 1.28µs ± 1% 0.38µs ± 5% -70.26% (p=0.008 n=5+5)
safe_zip/arg_lengths:5/num_args:1 1.38µs ± 1% 0.51µs ± 3% -63.26% (p=0.008 n=5+5)
safe_zip/arg_lengths:10/num_args:1 1.61µs ± 1% 0.69µs ± 3% -56.93% (p=0.008 n=5+5)
safe_zip/arg_lengths:100/num_args:1 5.39µs ± 1% 3.83µs ± 2% -29.03% (p=0.008 n=5+5)
safe_zip/arg_lengths:0/num_args:2 1.46µs ± 1% 0.32µs ± 4% -78.30% (p=0.008 n=5+5)
safe_zip/arg_lengths:1/num_args:2 1.52µs ± 1% 0.39µs ± 4% -74.20% (p=0.008 n=5+5)
safe_zip/arg_lengths:2/num_args:2 1.53µs ± 1% 0.44µs ± 4% -71.38% (p=0.008 n=5+5)
safe_zip/arg_lengths:5/num_args:2 1.66µs ± 2% 0.60µs ± 3% -63.96% (p=0.008 n=5+5)
safe_zip/arg_lengths:10/num_args:2 1.90µs ± 1% 0.82µs ± 3% -56.66% (p=0.008 n=5+5)
safe_zip/arg_lengths:100/num_args:2 6.51µs ± 1% 4.80µs ± 0% -26.23% (p=0.016 n=5+4)
safe_zip/arg_lengths:0/num_args:3 1.62µs ± 1% 0.36µs ± 4% -77.95% (p=0.008 n=5+5)
safe_zip/arg_lengths:1/num_args:3 1.68µs ± 1% 0.44µs ± 3% -73.75% (p=0.008 n=5+5)
safe_zip/arg_lengths:2/num_args:3 1.69µs ± 1% 0.50µs ± 3% -70.48% (p=0.008 n=5+5)
safe_zip/arg_lengths:5/num_args:3 1.83µs ± 1% 0.68µs ± 2% -62.73% (p=0.008 n=5+5)
safe_zip/arg_lengths:10/num_args:3 2.12µs ± 1% 0.96µs ± 1% -54.71% (p=0.008 n=5+5)
safe_zip/arg_lengths:100/num_args:3 7.34µs ± 2% 5.89µs ± 1% -19.74% (p=0.008 n=5+5)
```
In addition, improve the length mismatch error for safe_map and define __module__ on both functions.
PiperOrigin-RevId: 523475834
2023-04-11 12:43:04 -07:00
Peter Hawkins
0dbd467cea
Add a C++ implementation of safe map.
...
Before (argument names reversed, oops, fixed in code):
```
name time/op
safe_map/num_args:0/arg_lengths:1 1.43µs ± 1%
safe_map/num_args:1/arg_lengths:1 1.61µs ± 1%
safe_map/num_args:2/arg_lengths:1 1.72µs ± 0%
safe_map/num_args:5/arg_lengths:1 2.14µs ± 1%
safe_map/num_args:10/arg_lengths:1 2.87µs ± 1%
safe_map/num_args:100/arg_lengths:1 15.6µs ± 1%
safe_map/num_args:0/arg_lengths:2 1.65µs ± 0%
safe_map/num_args:1/arg_lengths:2 1.83µs ± 1%
safe_map/num_args:2/arg_lengths:2 1.97µs ± 1%
safe_map/num_args:5/arg_lengths:2 2.41µs ± 1%
safe_map/num_args:10/arg_lengths:2 3.22µs ± 2%
safe_map/num_args:100/arg_lengths:2 17.0µs ± 2%
safe_map/num_args:0/arg_lengths:3 1.83µs ± 1%
safe_map/num_args:1/arg_lengths:3 2.02µs ± 1%
safe_map/num_args:2/arg_lengths:3 2.16µs ± 1%
safe_map/num_args:5/arg_lengths:3 2.63µs ± 1%
safe_map/num_args:10/arg_lengths:3 3.48µs ± 1%
safe_map/num_args:100/arg_lengths:3 18.1µs ± 1%
```
After:
```
name time/op
safe_map/num_args:0/arg_lengths:1 409ns ± 1%
safe_map/num_args:1/arg_lengths:1 602ns ± 5%
safe_map/num_args:2/arg_lengths:1 777ns ± 4%
safe_map/num_args:5/arg_lengths:1 1.21µs ± 3%
safe_map/num_args:10/arg_lengths:1 1.93µs ± 2%
safe_map/num_args:100/arg_lengths:1 14.7µs ± 0%
safe_map/num_args:0/arg_lengths:2 451ns ± 1%
safe_map/num_args:1/arg_lengths:2 652ns ± 0%
safe_map/num_args:2/arg_lengths:2 850ns ± 4%
safe_map/num_args:5/arg_lengths:2 1.32µs ± 3%
safe_map/num_args:10/arg_lengths:2 2.11µs ± 2%
safe_map/num_args:100/arg_lengths:2 16.0µs ± 1%
safe_map/num_args:0/arg_lengths:3 496ns ± 1%
safe_map/num_args:1/arg_lengths:3 718ns ± 5%
safe_map/num_args:2/arg_lengths:3 919ns ± 4%
safe_map/num_args:5/arg_lengths:3 1.43µs ± 2%
safe_map/num_args:10/arg_lengths:3 2.30µs ± 2%
safe_map/num_args:100/arg_lengths:3 17.3µs ± 1%
```
PiperOrigin-RevId: 523263207
2023-04-10 18:09:56 -07:00
Yash Katariya
694e43a44a
Remove experimental_cpp_jit
since that flag is unused and also remove experimental_cpp_pjit
.
...
For dynamic shapes experimentation and normal debugging, `python_pjit` still exists so that problem doesn't exist which makes us free to remove these 2 flags.
I am leaving pmap's flag alone for now.
PiperOrigin-RevId: 522602754
2023-04-07 08:29:20 -07:00
Peter Hawkins
452f3c55e3
Rename jax._src.sharding_utils to jax._src.op_shardings.
...
Move some more op_sharding related helpers to that module.
PiperOrigin-RevId: 522343010
2023-04-06 08:32:46 -07:00
Yash Katariya
cf8c2b8450
Delete benchmark and pmap_benchmark files as they are legacy and replaced with api_benchmark.py
...
PiperOrigin-RevId: 519742866
2023-03-27 09:22:57 -07:00
Yash Katariya
1faa7a8edd
Add benchmarks for accessing index and replica id in addressable_shards
...
PiperOrigin-RevId: 517974091
2023-03-20 08:22:34 -07:00
Parker Schuh
48702171bf
Add benchmarks for np.array, device_put, and _arrays.
...
PiperOrigin-RevId: 516692492
2023-03-14 19:06:06 -07:00
Yash Katariya
233911c001
[Fix forward] Rollback the device_put_sharded and device_put_replicated change of using batched_device_put
...
PiperOrigin-RevId: 516244071
2023-03-13 10:07:44 -07:00
Peter Hawkins
1925aa1109
Split Sharding subclasses out of _src/sharding.py into _src/sharding_impls.py
...
By defining the Sharding base class in its own module, we can pull it out into a separate Bazel submodule, which will help pytype inference when defining Array.
PiperOrigin-RevId: 516223009
2023-03-13 08:50:18 -07:00
Emilio Cota
6f1d82916c
math_benchmark: add --set_env flag
...
PiperOrigin-RevId: 515417422
2023-03-09 13:04:12 -08:00
Emilio Cota
845d68b39e
math_benchmark: add dot op
...
PiperOrigin-RevId: 515408666
2023-03-09 12:24:47 -08:00
Peter Hawkins
8fb1fd318d
Replace jax._src.util.prod with math.prod.
...
math.prod() was added in Python 3.8, so we can assume it is always present.
PiperOrigin-RevId: 513011144
2023-02-28 12:41:00 -08:00
Lena Martens
4f48f94649
Update api_benchmark to not use any deprecated APIs.
...
PiperOrigin-RevId: 512941633
2023-02-28 08:33:26 -08:00
Yash Katariya
418c2f9d2a
Rename in_axis_resources
and out_axis_resources
with in_shardings
and out_shardings
. This is just a simple name replacement. It does not change any of the current pjit semantics and doesn't break any code.
...
This is a safe and trivial name replacement. It does not change any of the semantics. You can still pass in PatitionSpecs to in_shardings and out_shardings.
PiperOrigin-RevId: 510671300
2023-02-18 10:00:36 -08:00
Yash Katariya
d21ff0371f
Remove gda_benchmark file as GDA is deprecated.
...
PiperOrigin-RevId: 510469600
2023-02-17 10:46:25 -08:00
Peter Hawkins
88cc254f2c
[JAX] Replace uses of jax.interpreters.pxla.ShardedDeviceArray with jax.Array.
...
PiperOrigin-RevId: 508463147
2023-02-09 13:39:41 -08:00
Peter Hawkins
98b75cf27b
Prune accidental exports from jax.interpreters.pxla.
...
These imports do not appear to have users outside JAX itself.
PiperOrigin-RevId: 507835295
2023-02-07 11:16:42 -08:00
Peter Hawkins
428189f8fb
Replace uses of deprecated JAX sharding APIs with their new names in jax.sharding.
...
This change updates:
* {jax.experimental.maps.Mesh, jax.interpreters.pxla.Mesh} to jax.sharding.Mesh
* {jax.experimental.PartitionSpec, jax.experimental.pjit.PartitionSpec, jax.interpreters.pxla.PartitionSpec, jax.pxla.PartitionSpec} to jax.sharding.PartitionSpec
* jax.experimental.maps.NamedSharding to jax.sharding.NamedSharding.
PiperOrigin-RevId: 506994892
2023-02-03 14:28:45 -08:00
Jake VanderPlas
43e57db77a
Begin deprecation of public jax.ShapedArray
2023-01-30 11:27:58 -08:00
Emilio Cota
13e875f8b8
benchmarks: add math unary benchmarks
...
These will be used for benchmarking FP approximations in XLA.
PiperOrigin-RevId: 503991586
2023-01-23 08:17:16 -08:00
jax authors
eb875cd5dd
Added a pattern-match optimisation for inplace-select.
...
PiperOrigin-RevId: 497425937
2022-12-23 16:05:56 -08:00
Peter Hawkins
d6c67c97db
Remove redundant dtype canonicalization from jax.device_put().
...
Gives a small improvement to the included jax.device_put() benchmark on my VM:
```
name old cpu/op new cpu/op delta
device_put 91.3µs ± 5% 80.1µs ± 3% -12.29% (p=0.008 n=5+5)
name old time/op new time/op delta
device_put 91.4µs ± 5% 80.1µs ± 3% -12.29% (p=0.008 n=5+5)
```
jax.device_put() has not been optimized that much yet and there is plenty of room for further improvement.
PiperOrigin-RevId: 491727173
2022-11-29 13:47:36 -08:00
Yash Katariya
928dee415f
Optimize host_local_array_to_global_array
by caching the local to global conversion and flattening of axis resources. Also take a fast path for device_put which does not do abstractify
and only canonicalize_dtype on the entire array once (instead of doing it for every shard).
...
This results in a 5x speedup!
Before:
```
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
host_local_array_to_global_array 3.03 ms 3.02 ms 220
```
After:
```
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
host_local_array_to_global_array 0.673 ms 0.671 ms 985
```
PiperOrigin-RevId: 489880547
2022-11-20 20:53:02 -08:00
Yash Katariya
c42bad85ef
Make MeshPspecSharding
an alias for NamedSharding
(it was the other way around before this CL).
...
PiperOrigin-RevId: 488473538
2022-11-14 14:44:00 -08:00
jax authors
f4be5ab173
Merge pull request #12219 from jakevdp:indexing-slice
...
PiperOrigin-RevId: 485946084
2022-11-03 12:44:28 -07:00
Yash Katariya
532cd7ed74
Skip the benchmarks properly via state.skip_with_error when enough devices are not present.
...
PiperOrigin-RevId: 485931295
2022-11-03 11:44:57 -07:00
Jake VanderPlas
753562d574
Add benchmarks for repeated static indexing & slicing
2022-11-03 11:41:37 -07:00
Hyeontaek Lim
fc8f40ce0e
Internal visibility change
...
PiperOrigin-RevId: 484340424
2022-10-27 13:49:16 -07:00
Yash Katariya
cf6b5097d0
Remove pytest_benchmark for test-requirements.txt and move the benchmark file which was using that package to use google_benchmark.
...
PiperOrigin-RevId: 483736267
2022-10-25 11:59:32 -07:00
Yash Katariya
3572bb2db0
[Rollback]
...
Allow uncommitted single device PyArray in C++ pjit path.
PiperOrigin-RevId: 482084898
2022-10-18 19:42:10 -07:00
Kuangyuan Chen
d64da3d407
Roll forward with fix: Remove the original python function fun_
from C++ PjitFunction, as the destroying fun_
may yield the thread in some cases, which causes error during deleting the python object of PjitFunction.
...
PiperOrigin-RevId: 481950912
2022-10-18 10:05:53 -07:00
Kuangyuan Chen
fd2f590b3b
Allow uncommitted single device PyArray in C++ pjit path.
...
PiperOrigin-RevId: 481711690
2022-10-17 12:35:30 -07:00
jax authors
504b3c1b25
roll forward with the fix: Make params
arg in Compiled.call() position-only so that it does not conflict with the keyword args.
...
PiperOrigin-RevId: 481666211
2022-10-17 09:50:55 -07:00
Kuangyuan Chen
38a7582923
roll forward with the fix: Make params
arg in Compiled.call() position-only so that it does not conflict with the keyword args.
...
PiperOrigin-RevId: 481181330
2022-10-14 10:42:15 -07:00
jax authors
1945208d34
Rollback because of failing tests internally.
...
PiperOrigin-RevId: 481103002
2022-10-14 03:12:42 -07:00
Kuangyuan Chen
d082ea0d46
Implement a fast path for pjit AOT in C++ for jax.Array inputs.
...
PiperOrigin-RevId: 480983807
2022-10-13 14:24:05 -07:00
Yash Katariya
84768d2d49
Replace jax.xla.DeviceArray
private type with the new public type jax.Array
.
...
PiperOrigin-RevId: 477582562
2022-09-28 16:34:10 -07:00
Yash Katariya
9e4114f0f1
Move array.py
and sharding.py
from experimental/
to _src/
.
...
PiperOrigin-RevId: 477201711
2022-09-27 10:06:52 -07:00
Peter Hawkins
ba557d5e1b
Change JAX's copyright attribution from "Google LLC" to "The JAX Authors.".
...
See https://opensource.google/documentation/reference/releasing/contributions#copyright for more details.
PiperOrigin-RevId: 476167538
2022-09-22 12:27:19 -07:00
Kuangyuan Chen
405a2310ce
Implement pjit fast path in cpp for jax.Array inputs
...
PiperOrigin-RevId: 475988677
2022-09-21 20:18:18 -07:00
Kuangyuan Chen
b9e384363e
Add many args benchmark for jax.Array
...
PiperOrigin-RevId: 475853211
2022-09-21 09:54:51 -07:00