* raise minimum Bazel version to 2.0.0 to match TensorFlow.
* set --experimental_repo_remote_exec since it is required by the TF build.
* bump TF/XLA version.
* use the --config=short_logs trick from TF to suppress build warnings.
In principle, JAX should not need a hand-written CUDA kernel for the ThreeFry2x32 algorithm. In practice XLA aggresively inlines, which causes compilation times on GPU blow up when compiling potentially many copies of the PRNG kernel in a program. As a workaround, we add a hand-written CUDA kernel mostly to reduce compilation time.
When XLA becomes smarter about compiling this particular hash function, we should be able to remove the hand-written kernel once again.
It turns out there are implicit CUDA dependencies inside the TF libraries used by JAX, so the attempt to disable GPU dependencies conditionally didn't work.
--action_env variables are passed to every build action. This means that if the variable changes, the entire build cache is invalidated. By contrast, --repo_env variables are only passed to repository rules and don't affect every action. In principle this means that we should be able to rebuild JAX for different Python versions without rebuilding 99% of the C++ code.
Update bazel release for build script to 0.29.1 (same as TensorFlow.)
Switches XLA Python bindings to use pybind11.
Update XRT support to point to newer XRT client.
Update minimum bazel version to 0.24.0.
Fix missing backend argument to XLA Compile() calls.
Add a new build option --enable_mkl_dnn that enables MKLDNN contraction kernels in XLA. This leads to significant performance improvements for XLA's dot operator. Enable MKL-DNN by default.
Update XLA version to include MKL-DNN build fix.
Also add a new --enable_march_native build option that turns on -march=native. This is unlikely to have a significant performance impact since XLA JIT-compiles most of its code. Leaving this off by default because it also generates code unlikely to run across a wide selection of architectures and so is unsuitable for building pip wheels.
This makes initial builds cheaper (since we don't need to build some files in separate host and target configurations) but may make switching between build configurations more expensive (since we can share less work). The build script should optimize for the former.