mirror of
https://github.com/ROCm/jax.git
synced 2025-04-16 03:46:06 +00:00

This change also marks multiaccelerator test files in a way pytest can understand (if pytest is installed). By running single-device tests on a single TPU chip, running the test suite goes from 1hr 45m to 35m (both timings are running slow tests). I tried using bazel at first, which already supported parallel execution across TPU cores, but somehow it still takes 2h 20m! I'm not sure why it's so slow. It appears that bazel creates many new test processes over time, vs. pytest reuses the number of processes initially specified, and starting and stopping the TPU runtime takes a few seconds so that may be adding up. It also appears that single-process bazel is slower than single-process pytest, which I haven't looked into yet.
Github Actions workflows
See the Github documentation for more information on Github Actions in general.
Notes
- https://opensource.google/documentation/reference/github/services#actions
mandates using a specific commit for non-Google actions. We use
Ratchet to pin specific versions. If
you'd like to update an action, you can write something like
uses: 'actions/checkout@v3'
, and then run./ratchet pin workflow.yml
to convert to a commit hash. See the Ratchet README for installation and more detailed instructions.