Add a new build option --enable_mkl_dnn that enables MKLDNN contraction kernels in XLA. This leads to significant performance improvements for XLA's dot operator. Enable MKL-DNN by default.
Update XLA version to include MKL-DNN build fix.
Also add a new --enable_march_native build option that turns on -march=native. This is unlikely to have a significant performance impact since XLA JIT-compiles most of its code. Leaving this off by default because it also generates code unlikely to run across a wide selection of architectures and so is unsuitable for building pip wheels.
This makes initial builds cheaper (since we don't need to build some files in separate host and target configurations) but may make switching between build configurations more expensive (since we can share less work). The build script should optimize for the former.