mirror of
https://github.com/ROCm/jax.git
synced 2025-04-16 11:56:07 +00:00
[TPU CI] Run build matrix on v3-8 as well as v4-8
We're seeing failures on v3-8 that don't appear on the current v4-8 testing. v3-8 also exposes 8 devices (vs. v4-8 exposes 4), and some tests needs 8 devices to run. I just added a v3-8 runner VM. Also adds a missing pip install command (I only caught this with a fresh runner since it only needs to be installed once).
This commit is contained in:
parent
f2c5d287a3
commit
8d4b50e397
6
.github/workflows/cloud-tpu-ci-nightly.yml
vendored
6
.github/workflows/cloud-tpu-ci-nightly.yml
vendored
@ -9,12 +9,13 @@ permissions:
|
||||
contents: read
|
||||
jobs:
|
||||
cloud-tpu-test:
|
||||
runs-on: [self-hosted, tpu, v4-8]
|
||||
strategy:
|
||||
fail-fast: false # don't cancel all jobs on failure
|
||||
matrix:
|
||||
python-version: ["3.10"] # TODO(jakevdp): update to 3.11 when available.
|
||||
jaxlib-version: ["latest-release", "nightly"]
|
||||
tpu-type: ["v3-8", "v4-8"]
|
||||
runs-on: ["self-hosted", "tpu", "${{ matrix.tpu-type }}"]
|
||||
steps:
|
||||
# https://opensource.google/documentation/reference/github/services#actions
|
||||
# mandates using a specific commit for non-Google actions. We use
|
||||
@ -40,6 +41,7 @@ jobs:
|
||||
-f https://storage.googleapis.com/jax-releases/jaxlib_nightly_releases.html
|
||||
pip install libtpu-nightly \
|
||||
-f https://storage.googleapis.com/jax-releases/libtpu_releases.html
|
||||
pip install requests
|
||||
|
||||
else
|
||||
echo "Unknown jaxlib-version: ${{ matrix.jaxlib-version }}"
|
||||
@ -66,5 +68,5 @@ jobs:
|
||||
curl --location --request POST '${{ secrets.BUILD_CHAT_WEBHOOK }}' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data-raw "{
|
||||
'text': '\"$GITHUB_WORKFLOW\", jaxlib/libtpu version \"${{ matrix.jaxlib-version }}\" job failed: $GITHUB_SERVER_URL/$GITHUB_REPOSITORY/actions/runs/$GITHUB_RUN_ID'
|
||||
'text': '\"$GITHUB_WORKFLOW\", jaxlib/libtpu version \"${{ matrix.jaxlib-version }}\", TPU type ${{ matrix.tpu-type }} job failed: $GITHUB_SERVER_URL/$GITHUB_REPOSITORY/actions/runs/$GITHUB_RUN_ID'
|
||||
}"
|
||||
|
Loading…
x
Reference in New Issue
Block a user