Peter Hawkins
cb182b8b22
Use a Jacobi SVD solver for unbatched SVDs up to 1024x1024 on NVIDIA GPUs.
...
The unbatched Jacobi solver is faster for small-moderate matrices, and the unbatched kernel doesn't have size restrictions.
Timings on T4 GPU:
Before:
------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
svd/m:1/n:1 263587 ns 242274 ns 2780
svd/m:2/n:1 335561 ns 298238 ns 2303
svd/m:5/n:1 337784 ns 299841 ns 2304
svd/m:10/n:1 339184 ns 300703 ns 2311
svd/m:100/n:1 359826 ns 320088 ns 2159
svd/m:500/n:1 376124 ns 338660 ns 2076
svd/m:800/n:1 375779 ns 335590 ns 2060
svd/m:1000/n:1 419171 ns 341487 ns 2072
svd/m:1/n:2 307564 ns 270663 ns 2544
svd/m:2/n:2 320928 ns 283601 ns 2487
svd/m:5/n:2 377373 ns 344228 ns 2035
svd/m:10/n:2 380557 ns 349412 ns 1953
svd/m:100/n:2 435465 ns 403496 ns 1722
svd/m:500/n:2 444610 ns 410913 ns 1680
svd/m:800/n:2 454493 ns 416495 ns 1665
svd/m:1000/n:2 492110 ns 420539 ns 1665
svd/m:1/n:5 307316 ns 275833 ns 2531
svd/m:2/n:5 374318 ns 341432 ns 2086
svd/m:5/n:5 512928 ns 470293 ns 1361
svd/m:10/n:5 589330 ns 537070 ns 1353
svd/m:100/n:5 620164 ns 580166 ns 1193
svd/m:500/n:5 636424 ns 593692 ns 1180
svd/m:800/n:5 635545 ns 595016 ns 1181
svd/m:1000/n:5 672443 ns 597387 ns 1115
svd/m:1/n:10 310013 ns 273998 ns 2520
svd/m:2/n:10 370451 ns 334489 ns 2105
svd/m:5/n:10 560037 ns 522223 ns 1274
svd/m:10/n:10 572868 ns 535388 ns 1304
svd/m:100/n:10 959802 ns 918258 ns 765
svd/m:500/n:10 955958 ns 909778 ns 758
svd/m:800/n:10 924104 ns 879512 ns 777
svd/m:1000/n:10 950140 ns 883493 ns 775
svd/m:1/n:100 351237 ns 315554 ns 2198
svd/m:2/n:100 426883 ns 390089 ns 1792
svd/m:5/n:100 601557 ns 564493 ns 1255
svd/m:10/n:100 920819 ns 880011 ns 787
svd/m:100/n:100 7902281 ns 7229220 ns 95
svd/m:500/n:100 9720727 ns 9040679 ns 79
svd/m:800/n:100 9856378 ns 8998050 ns 79
svd/m:1000/n:100 9721017 ns 9086414 ns 79
svd/m:1/n:500 371171 ns 334217 ns 2117
svd/m:2/n:500 449165 ns 411499 ns 1700
svd/m:5/n:500 620354 ns 581866 ns 1185
svd/m:10/n:500 892375 ns 847239 ns 833
svd/m:100/n:500 9564810 ns 8867540 ns 79
svd/m:500/n:500 111924035 ns 104078023 ns 7
svd/m:800/n:500 147777319 ns 142730412 ns 5
svd/m:1000/n:500 154205084 ns 149740209 ns 5
svd/m:1/n:800 372122 ns 334212 ns 2119
svd/m:2/n:800 456672 ns 419260 ns 1680
svd/m:5/n:800 691208 ns 626003 ns 1190
svd/m:10/n:800 1017694 ns 941480 ns 730
svd/m:100/n:800 9892683 ns 9091043 ns 76
svd/m:500/n:800 144134235 ns 139129722 ns 5
svd/m:800/n:800 342790246 ns 333299774 ns 2
svd/m:1000/n:800 432820082 ns 427978978 ns 2
svd/m:1/n:1000 372785 ns 335745 ns 1805
svd/m:2/n:1000 451946 ns 413341 ns 1668
svd/m:5/n:1000 618475 ns 577213 ns 1169
svd/m:10/n:1000 907729 ns 863335 ns 808
svd/m:100/n:1000 9868543 ns 9116870 ns 76
svd/m:500/n:1000 156777811 ns 152042065 ns 5
svd/m:800/n:1000 429704070 ns 424677592 ns 2
svd/m:1000/n:1000 654864311 ns 642693162 ns 1
After:
------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
svd/m:1/n:1 265980 ns 245433 ns 2791
svd/m:2/n:1 340203 ns 302783 ns 2288
svd/m:5/n:1 337807 ns 301916 ns 2286
svd/m:10/n:1 338064 ns 302441 ns 2297
svd/m:100/n:1 335444 ns 298440 ns 2327
svd/m:500/n:1 338025 ns 302096 ns 2272
svd/m:800/n:1 328382 ns 291740 ns 2252
svd/m:1000/n:1 397494 ns 310905 ns 2239
svd/m:1/n:2 310464 ns 274507 ns 2535
svd/m:2/n:2 319999 ns 284247 ns 2515
svd/m:5/n:2 373435 ns 335919 ns 2069
svd/m:10/n:2 376327 ns 339327 ns 2056
svd/m:100/n:2 385061 ns 349258 ns 2003
svd/m:500/n:2 392352 ns 355735 ns 1932
svd/m:800/n:2 410736 ns 370677 ns 1881
svd/m:1000/n:2 494326 ns 405603 ns 1721
svd/m:1/n:5 316735 ns 277292 ns 2538
svd/m:2/n:5 383748 ns 342218 ns 2077
svd/m:5/n:5 494204 ns 454309 ns 1476
svd/m:10/n:5 547017 ns 508184 ns 1371
svd/m:100/n:5 514537 ns 476761 ns 1460
svd/m:500/n:5 544656 ns 504877 ns 1381
svd/m:800/n:5 642590 ns 599314 ns 1159
svd/m:1000/n:5 706166 ns 621209 ns 1106
svd/m:1/n:10 310825 ns 274374 ns 2511
svd/m:2/n:10 381316 ns 344202 ns 2094
svd/m:5/n:10 565469 ns 526759 ns 1266
svd/m:10/n:10 576111 ns 537286 ns 1299
svd/m:100/n:10 653250 ns 613392 ns 1137
svd/m:500/n:10 690532 ns 645828 ns 1080
svd/m:800/n:10 763924 ns 723677 ns 959
svd/m:1000/n:10 940342 ns 855517 ns 818
svd/m:1/n:100 306134 ns 271533 ns 2526
svd/m:2/n:100 374680 ns 339298 ns 2071
svd/m:5/n:100 576926 ns 539062 ns 1228
svd/m:10/n:100 656806 ns 615171 ns 1123
svd/m:100/n:100 3295164 ns 3138621 ns 223
svd/m:500/n:100 4269347 ns 4166000 ns 168
svd/m:800/n:100 4656541 ns 4522247 ns 154
svd/m:1000/n:100 6479223 ns 6354578 ns 112
svd/m:1/n:500 329966 ns 289083 ns 2440
svd/m:2/n:500 407535 ns 366794 ns 1947
svd/m:5/n:500 567367 ns 522809 ns 1336
svd/m:10/n:500 712307 ns 657608 ns 1065
svd/m:100/n:500 4262986 ns 4169907 ns 167
svd/m:500/n:500 28824720 ns 28650258 ns 25
svd/m:800/n:500 29330139 ns 28677269 ns 25
svd/m:1000/n:500 30848037 ns 30089216 ns 23
svd/m:1/n:800 328620 ns 289181 ns 2329
svd/m:2/n:800 419052 ns 379483 ns 1876
svd/m:5/n:800 587366 ns 546979 ns 1269
svd/m:10/n:800 830762 ns 787923 ns 893
svd/m:100/n:800 4763633 ns 4595738 ns 152
svd/m:500/n:800 30447861 ns 29949714 ns 24
svd/m:800/n:800 94188958 ns 93488372 ns 8
svd/m:1000/n:800 94701529 ns 93394677 ns 7
svd/m:1/n:1000 351102 ns 313099 ns 2218
svd/m:2/n:1000 446543 ns 407807 ns 1708
svd/m:5/n:1000 661152 ns 616174 ns 1129
svd/m:10/n:1000 915743 ns 873397 ns 802
svd/m:100/n:1000 6434730 ns 6282779 ns 113
svd/m:500/n:1000 30244321 ns 29684290 ns 24
svd/m:800/n:1000 92727423 ns 91477078 ns 8
svd/m:1000/n:1000 169500709 ns 168358420 ns 4
PiperOrigin-RevId: 582041508
2023-11-13 12:04:13 -08:00
Yash Katariya
fd09b35645
Optimize make_array_from_callback for fully replicated shardings by going via batched_device_put
...
Before:
```
name cpu/op
bench_make_array_from_callback_fully_replicated_sharding 467µs ± 3%
name time/op
bench_make_array_from_callback_fully_replicated_sharding 467µs ± 3%
```
After:
```
name cpu/op
bench_make_array_from_callback_fully_replicated_sharding 28.1µs ± 2%
name time/op
bench_make_array_from_callback_fully_replicated_sharding 28.1µs ± 2%
```
PiperOrigin-RevId: 572429822
2023-10-10 19:02:04 -07:00
Patrick Kidger
9d73441ff1
Added serial_dot_products benchmark
2023-09-21 15:25:52 -07:00
Roy Frostig
6abefa1977
fast dispatch for functions over typed PRNG key arrays
...
Before this change, JAX could dispatch compiled functions over new-style (typed)
RNG key arrays, but it would always do so off of the fast (C++-based) dispatch
path. In other words, switching from old-style `uint32` RNG keys to new-style
keys would regress dispatch times. With this change, dispatch happens on the
fast path again and performance regressions ought to be minimal.
We currently maintain only one pytree registry, for all registered pytree node
types. We want RNG key arrays to also be treated as pytree leaves everywhere
*except* during dispatch. In other words: we want operations on (typed) RNG key
arrays to appear in Jaxpr, but we want to unravel those arrays into their
underlying `uint32` arrays only during dispatch.
To do this, we add a new internal pytree registry that dispatch respects
uniquely. This registry includes all items in the default registry, but also the
RNG key array type.
Co-authored-by: Matthew Johnson <mattjj@google.com>
PiperOrigin-RevId: 565077758
2023-09-13 09:43:58 -07:00
Jake VanderPlas
368d3433a6
Add random benchmarks
...
The purpose of this is to measure the difference in dispatch seed between raw keys and new-style typed keys. The latter does not yet hit the C++ fast path, and so we expect it to incur a small additional overhead at dispatch time. Part of #9263 .
PiperOrigin-RevId: 559782442
2023-08-24 09:55:07 -07:00
jax authors
f498442daa
[jax][benchmark] Added clearing caches for benchmarking compilation time in sparse JAX benchmarks
...
PiperOrigin-RevId: 553179605
2023-08-02 10:07:54 -07:00
Peter Hawkins
76cda0ae07
Update flags to use the ABSL typed flag API.
...
Change flags to use the newer definition style where the flag is read via a typed FlagHolder object returned by the DEFINE_... function. The advantage of doing this is that `flag.value` has a type known to the type checker, rather than reading it as an attr out of a gigantic config dictionary.
For jax.config flags, define a typed FlagHolder object that is returned when defining a flag, matching the ABSL API.
Move a number of flags into the file that consumes them. There's no reason we're defining every flag in `config.py`.
This PR does not change the similar "state" objects in `jax.config`. Changing those is for a future PR.
PiperOrigin-RevId: 551604974
2023-07-27 12:15:58 -07:00
Yash Katariya
a6254c75e0
Improve the shape incompatible error message by adding the argument/result name path to it.
...
PiperOrigin-RevId: 529605855
2023-05-04 21:50:04 -07:00
Jake VanderPlas
fbe4f10403
Change to simpler import for jax.config
2023-04-21 11:51:22 -07:00
Jake VanderPlas
5521423d92
Change np.prod->math.prod
...
Why? This is generally used for static operations on shapes, but np.prod
has an unfortunate corner-case behavior that np.prod([]) returns a float.
math.prod is available as of Python 3.8, and is a better solution here.
2023-04-13 11:48:11 -07:00
Peter Hawkins
74384e6a87
Add a C++ safe_zip implementation.
...
Benchmark results on my workstation:
```
name old cpu/op new cpu/op delta
safe_zip/arg_lengths:0/num_args:1 1.22µs ± 1% 0.28µs ± 8% -77.33% (p=0.008 n=5+5)
safe_zip/arg_lengths:1/num_args:1 1.28µs ± 1% 0.34µs ± 6% -73.18% (p=0.008 n=5+5)
safe_zip/arg_lengths:2/num_args:1 1.28µs ± 1% 0.38µs ± 5% -70.26% (p=0.008 n=5+5)
safe_zip/arg_lengths:5/num_args:1 1.38µs ± 1% 0.51µs ± 3% -63.26% (p=0.008 n=5+5)
safe_zip/arg_lengths:10/num_args:1 1.61µs ± 1% 0.69µs ± 3% -56.93% (p=0.008 n=5+5)
safe_zip/arg_lengths:100/num_args:1 5.39µs ± 1% 3.83µs ± 2% -29.03% (p=0.008 n=5+5)
safe_zip/arg_lengths:0/num_args:2 1.46µs ± 1% 0.32µs ± 4% -78.30% (p=0.008 n=5+5)
safe_zip/arg_lengths:1/num_args:2 1.52µs ± 1% 0.39µs ± 4% -74.20% (p=0.008 n=5+5)
safe_zip/arg_lengths:2/num_args:2 1.53µs ± 1% 0.44µs ± 4% -71.38% (p=0.008 n=5+5)
safe_zip/arg_lengths:5/num_args:2 1.66µs ± 2% 0.60µs ± 3% -63.96% (p=0.008 n=5+5)
safe_zip/arg_lengths:10/num_args:2 1.90µs ± 1% 0.82µs ± 3% -56.66% (p=0.008 n=5+5)
safe_zip/arg_lengths:100/num_args:2 6.51µs ± 1% 4.80µs ± 0% -26.23% (p=0.016 n=5+4)
safe_zip/arg_lengths:0/num_args:3 1.62µs ± 1% 0.36µs ± 4% -77.95% (p=0.008 n=5+5)
safe_zip/arg_lengths:1/num_args:3 1.68µs ± 1% 0.44µs ± 3% -73.75% (p=0.008 n=5+5)
safe_zip/arg_lengths:2/num_args:3 1.69µs ± 1% 0.50µs ± 3% -70.48% (p=0.008 n=5+5)
safe_zip/arg_lengths:5/num_args:3 1.83µs ± 1% 0.68µs ± 2% -62.73% (p=0.008 n=5+5)
safe_zip/arg_lengths:10/num_args:3 2.12µs ± 1% 0.96µs ± 1% -54.71% (p=0.008 n=5+5)
safe_zip/arg_lengths:100/num_args:3 7.34µs ± 2% 5.89µs ± 1% -19.74% (p=0.008 n=5+5)
```
In addition, improve the length mismatch error for safe_map and define __module__ on both functions.
PiperOrigin-RevId: 523475834
2023-04-11 12:43:04 -07:00
Peter Hawkins
0dbd467cea
Add a C++ implementation of safe map.
...
Before (argument names reversed, oops, fixed in code):
```
name time/op
safe_map/num_args:0/arg_lengths:1 1.43µs ± 1%
safe_map/num_args:1/arg_lengths:1 1.61µs ± 1%
safe_map/num_args:2/arg_lengths:1 1.72µs ± 0%
safe_map/num_args:5/arg_lengths:1 2.14µs ± 1%
safe_map/num_args:10/arg_lengths:1 2.87µs ± 1%
safe_map/num_args:100/arg_lengths:1 15.6µs ± 1%
safe_map/num_args:0/arg_lengths:2 1.65µs ± 0%
safe_map/num_args:1/arg_lengths:2 1.83µs ± 1%
safe_map/num_args:2/arg_lengths:2 1.97µs ± 1%
safe_map/num_args:5/arg_lengths:2 2.41µs ± 1%
safe_map/num_args:10/arg_lengths:2 3.22µs ± 2%
safe_map/num_args:100/arg_lengths:2 17.0µs ± 2%
safe_map/num_args:0/arg_lengths:3 1.83µs ± 1%
safe_map/num_args:1/arg_lengths:3 2.02µs ± 1%
safe_map/num_args:2/arg_lengths:3 2.16µs ± 1%
safe_map/num_args:5/arg_lengths:3 2.63µs ± 1%
safe_map/num_args:10/arg_lengths:3 3.48µs ± 1%
safe_map/num_args:100/arg_lengths:3 18.1µs ± 1%
```
After:
```
name time/op
safe_map/num_args:0/arg_lengths:1 409ns ± 1%
safe_map/num_args:1/arg_lengths:1 602ns ± 5%
safe_map/num_args:2/arg_lengths:1 777ns ± 4%
safe_map/num_args:5/arg_lengths:1 1.21µs ± 3%
safe_map/num_args:10/arg_lengths:1 1.93µs ± 2%
safe_map/num_args:100/arg_lengths:1 14.7µs ± 0%
safe_map/num_args:0/arg_lengths:2 451ns ± 1%
safe_map/num_args:1/arg_lengths:2 652ns ± 0%
safe_map/num_args:2/arg_lengths:2 850ns ± 4%
safe_map/num_args:5/arg_lengths:2 1.32µs ± 3%
safe_map/num_args:10/arg_lengths:2 2.11µs ± 2%
safe_map/num_args:100/arg_lengths:2 16.0µs ± 1%
safe_map/num_args:0/arg_lengths:3 496ns ± 1%
safe_map/num_args:1/arg_lengths:3 718ns ± 5%
safe_map/num_args:2/arg_lengths:3 919ns ± 4%
safe_map/num_args:5/arg_lengths:3 1.43µs ± 2%
safe_map/num_args:10/arg_lengths:3 2.30µs ± 2%
safe_map/num_args:100/arg_lengths:3 17.3µs ± 1%
```
PiperOrigin-RevId: 523263207
2023-04-10 18:09:56 -07:00
Yash Katariya
694e43a44a
Remove experimental_cpp_jit
since that flag is unused and also remove experimental_cpp_pjit
.
...
For dynamic shapes experimentation and normal debugging, `python_pjit` still exists so that problem doesn't exist which makes us free to remove these 2 flags.
I am leaving pmap's flag alone for now.
PiperOrigin-RevId: 522602754
2023-04-07 08:29:20 -07:00
Peter Hawkins
452f3c55e3
Rename jax._src.sharding_utils to jax._src.op_shardings.
...
Move some more op_sharding related helpers to that module.
PiperOrigin-RevId: 522343010
2023-04-06 08:32:46 -07:00
Yash Katariya
cf8c2b8450
Delete benchmark and pmap_benchmark files as they are legacy and replaced with api_benchmark.py
...
PiperOrigin-RevId: 519742866
2023-03-27 09:22:57 -07:00
Yash Katariya
1faa7a8edd
Add benchmarks for accessing index and replica id in addressable_shards
...
PiperOrigin-RevId: 517974091
2023-03-20 08:22:34 -07:00
Parker Schuh
48702171bf
Add benchmarks for np.array, device_put, and _arrays.
...
PiperOrigin-RevId: 516692492
2023-03-14 19:06:06 -07:00
Yash Katariya
233911c001
[Fix forward] Rollback the device_put_sharded and device_put_replicated change of using batched_device_put
...
PiperOrigin-RevId: 516244071
2023-03-13 10:07:44 -07:00
Peter Hawkins
1925aa1109
Split Sharding subclasses out of _src/sharding.py into _src/sharding_impls.py
...
By defining the Sharding base class in its own module, we can pull it out into a separate Bazel submodule, which will help pytype inference when defining Array.
PiperOrigin-RevId: 516223009
2023-03-13 08:50:18 -07:00
Emilio Cota
6f1d82916c
math_benchmark: add --set_env flag
...
PiperOrigin-RevId: 515417422
2023-03-09 13:04:12 -08:00
Emilio Cota
845d68b39e
math_benchmark: add dot op
...
PiperOrigin-RevId: 515408666
2023-03-09 12:24:47 -08:00
Peter Hawkins
8fb1fd318d
Replace jax._src.util.prod with math.prod.
...
math.prod() was added in Python 3.8, so we can assume it is always present.
PiperOrigin-RevId: 513011144
2023-02-28 12:41:00 -08:00
Lena Martens
4f48f94649
Update api_benchmark to not use any deprecated APIs.
...
PiperOrigin-RevId: 512941633
2023-02-28 08:33:26 -08:00
Yash Katariya
418c2f9d2a
Rename in_axis_resources
and out_axis_resources
with in_shardings
and out_shardings
. This is just a simple name replacement. It does not change any of the current pjit semantics and doesn't break any code.
...
This is a safe and trivial name replacement. It does not change any of the semantics. You can still pass in PatitionSpecs to in_shardings and out_shardings.
PiperOrigin-RevId: 510671300
2023-02-18 10:00:36 -08:00
Yash Katariya
d21ff0371f
Remove gda_benchmark file as GDA is deprecated.
...
PiperOrigin-RevId: 510469600
2023-02-17 10:46:25 -08:00
Peter Hawkins
88cc254f2c
[JAX] Replace uses of jax.interpreters.pxla.ShardedDeviceArray with jax.Array.
...
PiperOrigin-RevId: 508463147
2023-02-09 13:39:41 -08:00
Peter Hawkins
98b75cf27b
Prune accidental exports from jax.interpreters.pxla.
...
These imports do not appear to have users outside JAX itself.
PiperOrigin-RevId: 507835295
2023-02-07 11:16:42 -08:00
Peter Hawkins
428189f8fb
Replace uses of deprecated JAX sharding APIs with their new names in jax.sharding.
...
This change updates:
* {jax.experimental.maps.Mesh, jax.interpreters.pxla.Mesh} to jax.sharding.Mesh
* {jax.experimental.PartitionSpec, jax.experimental.pjit.PartitionSpec, jax.interpreters.pxla.PartitionSpec, jax.pxla.PartitionSpec} to jax.sharding.PartitionSpec
* jax.experimental.maps.NamedSharding to jax.sharding.NamedSharding.
PiperOrigin-RevId: 506994892
2023-02-03 14:28:45 -08:00
Jake VanderPlas
43e57db77a
Begin deprecation of public jax.ShapedArray
2023-01-30 11:27:58 -08:00
Emilio Cota
13e875f8b8
benchmarks: add math unary benchmarks
...
These will be used for benchmarking FP approximations in XLA.
PiperOrigin-RevId: 503991586
2023-01-23 08:17:16 -08:00
jax authors
eb875cd5dd
Added a pattern-match optimisation for inplace-select.
...
PiperOrigin-RevId: 497425937
2022-12-23 16:05:56 -08:00
Peter Hawkins
d6c67c97db
Remove redundant dtype canonicalization from jax.device_put().
...
Gives a small improvement to the included jax.device_put() benchmark on my VM:
```
name old cpu/op new cpu/op delta
device_put 91.3µs ± 5% 80.1µs ± 3% -12.29% (p=0.008 n=5+5)
name old time/op new time/op delta
device_put 91.4µs ± 5% 80.1µs ± 3% -12.29% (p=0.008 n=5+5)
```
jax.device_put() has not been optimized that much yet and there is plenty of room for further improvement.
PiperOrigin-RevId: 491727173
2022-11-29 13:47:36 -08:00
Yash Katariya
928dee415f
Optimize host_local_array_to_global_array
by caching the local to global conversion and flattening of axis resources. Also take a fast path for device_put which does not do abstractify
and only canonicalize_dtype on the entire array once (instead of doing it for every shard).
...
This results in a 5x speedup!
Before:
```
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
host_local_array_to_global_array 3.03 ms 3.02 ms 220
```
After:
```
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
host_local_array_to_global_array 0.673 ms 0.671 ms 985
```
PiperOrigin-RevId: 489880547
2022-11-20 20:53:02 -08:00
Yash Katariya
c42bad85ef
Make MeshPspecSharding
an alias for NamedSharding
(it was the other way around before this CL).
...
PiperOrigin-RevId: 488473538
2022-11-14 14:44:00 -08:00
jax authors
f4be5ab173
Merge pull request #12219 from jakevdp:indexing-slice
...
PiperOrigin-RevId: 485946084
2022-11-03 12:44:28 -07:00
Yash Katariya
532cd7ed74
Skip the benchmarks properly via state.skip_with_error when enough devices are not present.
...
PiperOrigin-RevId: 485931295
2022-11-03 11:44:57 -07:00
Jake VanderPlas
753562d574
Add benchmarks for repeated static indexing & slicing
2022-11-03 11:41:37 -07:00
Hyeontaek Lim
fc8f40ce0e
Internal visibility change
...
PiperOrigin-RevId: 484340424
2022-10-27 13:49:16 -07:00
Yash Katariya
cf6b5097d0
Remove pytest_benchmark for test-requirements.txt and move the benchmark file which was using that package to use google_benchmark.
...
PiperOrigin-RevId: 483736267
2022-10-25 11:59:32 -07:00
Yash Katariya
3572bb2db0
[Rollback]
...
Allow uncommitted single device PyArray in C++ pjit path.
PiperOrigin-RevId: 482084898
2022-10-18 19:42:10 -07:00
Kuangyuan Chen
d64da3d407
Roll forward with fix: Remove the original python function fun_
from C++ PjitFunction, as the destroying fun_
may yield the thread in some cases, which causes error during deleting the python object of PjitFunction.
...
PiperOrigin-RevId: 481950912
2022-10-18 10:05:53 -07:00
Kuangyuan Chen
fd2f590b3b
Allow uncommitted single device PyArray in C++ pjit path.
...
PiperOrigin-RevId: 481711690
2022-10-17 12:35:30 -07:00
jax authors
504b3c1b25
roll forward with the fix: Make params
arg in Compiled.call() position-only so that it does not conflict with the keyword args.
...
PiperOrigin-RevId: 481666211
2022-10-17 09:50:55 -07:00
Kuangyuan Chen
38a7582923
roll forward with the fix: Make params
arg in Compiled.call() position-only so that it does not conflict with the keyword args.
...
PiperOrigin-RevId: 481181330
2022-10-14 10:42:15 -07:00
jax authors
1945208d34
Rollback because of failing tests internally.
...
PiperOrigin-RevId: 481103002
2022-10-14 03:12:42 -07:00
Kuangyuan Chen
d082ea0d46
Implement a fast path for pjit AOT in C++ for jax.Array inputs.
...
PiperOrigin-RevId: 480983807
2022-10-13 14:24:05 -07:00
Yash Katariya
84768d2d49
Replace jax.xla.DeviceArray
private type with the new public type jax.Array
.
...
PiperOrigin-RevId: 477582562
2022-09-28 16:34:10 -07:00
Yash Katariya
9e4114f0f1
Move array.py
and sharding.py
from experimental/
to _src/
.
...
PiperOrigin-RevId: 477201711
2022-09-27 10:06:52 -07:00
Peter Hawkins
ba557d5e1b
Change JAX's copyright attribution from "Google LLC" to "The JAX Authors.".
...
See https://opensource.google/documentation/reference/releasing/contributions#copyright for more details.
PiperOrigin-RevId: 476167538
2022-09-22 12:27:19 -07:00
Kuangyuan Chen
405a2310ce
Implement pjit fast path in cpp for jax.Array inputs
...
PiperOrigin-RevId: 475988677
2022-09-21 20:18:18 -07:00