[TPU CI] Run build matrix on v3-8 as well as v4-8

We're seeing failures on v3-8 that don't appear on the current v4-8
testing. v3-8 also exposes 8 devices (vs. v4-8 exposes 4), and some
tests needs 8 devices to run.

I just added a v3-8 runner VM.

Also adds a missing pip install command (I only caught this with a
fresh runner since it only needs to be installed once).
This commit is contained in:
Skye Wanderman-Milne 2022-12-09 22:11:18 +00:00
parent f2c5d287a3
commit 8d4b50e397

View File

@ -9,12 +9,13 @@ permissions:
contents: read
jobs:
cloud-tpu-test:
runs-on: [self-hosted, tpu, v4-8]
strategy:
fail-fast: false # don't cancel all jobs on failure
matrix:
python-version: ["3.10"] # TODO(jakevdp): update to 3.11 when available.
jaxlib-version: ["latest-release", "nightly"]
tpu-type: ["v3-8", "v4-8"]
runs-on: ["self-hosted", "tpu", "${{ matrix.tpu-type }}"]
steps:
# https://opensource.google/documentation/reference/github/services#actions
# mandates using a specific commit for non-Google actions. We use
@ -40,6 +41,7 @@ jobs:
-f https://storage.googleapis.com/jax-releases/jaxlib_nightly_releases.html
pip install libtpu-nightly \
-f https://storage.googleapis.com/jax-releases/libtpu_releases.html
pip install requests
else
echo "Unknown jaxlib-version: ${{ matrix.jaxlib-version }}"
@ -66,5 +68,5 @@ jobs:
curl --location --request POST '${{ secrets.BUILD_CHAT_WEBHOOK }}' \
--header 'Content-Type: application/json' \
--data-raw "{
'text': '\"$GITHUB_WORKFLOW\", jaxlib/libtpu version \"${{ matrix.jaxlib-version }}\" job failed: $GITHUB_SERVER_URL/$GITHUB_REPOSITORY/actions/runs/$GITHUB_RUN_ID'
'text': '\"$GITHUB_WORKFLOW\", jaxlib/libtpu version \"${{ matrix.jaxlib-version }}\", TPU type ${{ matrix.tpu-type }} job failed: $GITHUB_SERVER_URL/$GITHUB_REPOSITORY/actions/runs/$GITHUB_RUN_ID'
}"