mirror of
https://github.com/ROCm/jax.git
synced 2025-04-19 05:16:06 +00:00
the reverters have been reverted
This commit is contained in:
parent
dd8a372afb
commit
255a244e14
@ -63,9 +63,6 @@ _BARRIER_TIMED_OUT_MSG = (
|
||||
"Suggestions for possible fixes:\n"
|
||||
"* Check the logs to see if one or more processes failed.\n"
|
||||
"* Make sure the training and checkpointing endpoints are close geographically.\n"
|
||||
"* Try setting these environment variables: "
|
||||
"`TENSORSTORE_CURL_LOW_SPEED_TIME_SECONDS=60` "
|
||||
"`TENSORSTORE_CURL_LOW_SPEED_LIMIT_BYTES=256` which will force a http retry\n"
|
||||
"* Try increasing the timeout you pass to GlobalAsyncCheckpointManager.")
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
Loading…
x
Reference in New Issue
Block a user