mirror of
https://github.com/ROCm/jax.git
synced 2025-04-18 12:56:07 +00:00
Add instructions for how to use the TensorBoard profiler to the profiling docs. (#3481)
This commit is contained in:
parent
71023227c6
commit
8fe6da08d4
BIN
docs/_static/tensorboard_profiler.png
vendored
Normal file
BIN
docs/_static/tensorboard_profiler.png
vendored
Normal file
Binary file not shown.
After Width: | Height: | Size: 29 KiB |
@ -1,12 +1,141 @@
|
||||
# Profiling JAX programs
|
||||
|
||||
To profile JAX programs, there are currently two options: `nvprof` and XLA's
|
||||
profiling features.
|
||||
## TensorBoard profiling
|
||||
|
||||
## nvprof
|
||||
[TensorBoard's
|
||||
profiler](https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras>)
|
||||
can be used to profile JAX programs. Tensorboard is a great way to acquire and
|
||||
visualize performance traces and profiles of your program, including activity on
|
||||
GPU and TPU. The end result looks something like this:
|
||||
|
||||
Nvidia's `nvprof` tool can be used to trace and profile JAX code on GPU. For
|
||||
details, see the `nvprof` documentation.
|
||||

|
||||
|
||||
### Installation
|
||||
|
||||
```shell
|
||||
# Requires TensorFlow and TensorBoard version >= 2.2
|
||||
pip install --upgrade tensorflow tensorboard_plugin_profile
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
The following are instructions for capturing a manually-triggered N-second trace
|
||||
from a running program.
|
||||
|
||||
1. Start a TensorBoard server:
|
||||
|
||||
```shell
|
||||
tensorboard --logdir /tmp/tensorboard/
|
||||
```
|
||||
|
||||
You should be able to load TensorBoard at <http://localhost:6006/>. You can
|
||||
specify a different port with the `--port` flag. See {ref}`remote_profiling`
|
||||
below if running JAX on a remote server.<br /><br />
|
||||
|
||||
1. In the Python program or process you'd like to profile, add the following
|
||||
somewhere near the beginning:
|
||||
|
||||
```python
|
||||
import jax.profiler
|
||||
jax.profiler.start_server(9999)
|
||||
```
|
||||
|
||||
This starts the profiler server that TensorBoard connects to. The profiler
|
||||
server must be running before you move on to the next step.
|
||||
|
||||
If you'd like to profile a snippet of a long-running program (e.g. a long
|
||||
training loop), you can put this at the beginning of the program and start
|
||||
your program as usual. If you'd like to profile a short program (e.g. a
|
||||
microbenchmark), one option is to start the profiler server in an IPython
|
||||
shell, and run the short program with `%run` after starting the capture in
|
||||
the next step. Another option is to start the profiler server at the
|
||||
beginning of the program and use `time.sleep()` to give you enough time to
|
||||
start the capture.<br /><br />
|
||||
|
||||
1. Open <http://localhost:6006/#profile>, and click the "CAPTURE PROFILE" button
|
||||
in the upper left. Enter "localhost:9999" as the profile service URL (this is
|
||||
the address of the profiler server you started in the previous step). Enter
|
||||
the number of milliseconds you'd like to profile for, and click "CAPTURE".<br
|
||||
/><br />
|
||||
|
||||
1. If the code you'd like to profile isn't already running (e.g. if you started
|
||||
the profiler server in a Python shell), run it while the capture is
|
||||
running.<br /><br />
|
||||
|
||||
1. After the capture finishes, TensorBoard should automatically refresh. (Not
|
||||
all of the TensorBoard profiling features are hooked up with JAX, so it may
|
||||
initially look like nothing was captured.) On the left under "Tools", select
|
||||
"trace_viewer".
|
||||
|
||||
You should now see a timeline of the execution. You can use the WASD keys to
|
||||
navigate the trace, and click or drag to select events to see more details at
|
||||
the bottom. See [these TensorFlow
|
||||
docs](https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras#use_the_tensorflow_profiler_to_profile_model_training_performance)
|
||||
for more details on using the trace viewer.<br /><br />
|
||||
|
||||
1. By default, the events in the trace viewer are mostly low-level internal JAX
|
||||
functions. You can add your own events and functions by using
|
||||
{func}`jax.profiler.TraceContext` and {func}`jax.profiler.trace_function` in
|
||||
your code and capturing a new profile.
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
#### GPU profiling
|
||||
|
||||
Programs running on GPU should produce traces for the GPU streams near the top
|
||||
of the trace viewer. If you're only seeing the host traces, check your program
|
||||
logs and/or output for the following error messages.
|
||||
|
||||
**If you get an error like: `Could not load dynamic library 'libcupti.so.10.1'`**<br />
|
||||
Full error:
|
||||
```
|
||||
W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.1'; dlerror: libcupti.so.10.1: cannot open shared object file: No such file or directory
|
||||
2020-06-12 13:19:59.822799: E external/org_tensorflow/tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1422] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.
|
||||
```
|
||||
|
||||
Add the path to `libcupti.so` to the environment variable `LD_LIBRARY_PATH`.
|
||||
(Try `locate libcupti.so` to find the path.) For example:
|
||||
```shell
|
||||
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/extras/CUPTI/lib64/:$LD_LIBRARY_PATH
|
||||
```
|
||||
|
||||
**If you get an error like: `failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES`**<br />
|
||||
Full error:
|
||||
```shell
|
||||
E external/org_tensorflow/tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1445] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
|
||||
2020-06-12 14:31:54.097791: E external/org_tensorflow/tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] function cupti_interface_->ActivityDisable(activity)failed with error CUPTI_ERROR_NOT_INITIALIZED
|
||||
```
|
||||
|
||||
Run the following commands (note this requires a reboot):
|
||||
```shell
|
||||
echo 'options nvidia "NVreg_RestrictProfilingToAdminUsers=0"' | sudo tee -a /etc/modprobe.d/nvidia-kernel-common.conf
|
||||
sudo update-initramfs -u
|
||||
sudo reboot now
|
||||
```
|
||||
|
||||
See [Nvidia's documentation on this
|
||||
error](https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti)
|
||||
for more information.
|
||||
|
||||
(remote_profiling)=
|
||||
#### Profiling on a remote machine
|
||||
|
||||
If the JAX program you'd like to profile is running on a remote machine, one
|
||||
option is to run all the instructions above on the remote machine (in
|
||||
particular, start the TensorBoard server on the remote machine), then use SSH
|
||||
local port forwarding to access the TensorBoard web UI from your local
|
||||
machine. Use the following SSH command to forward the default TensorBoard port
|
||||
6006 from the local to the remote machine:
|
||||
|
||||
```shell
|
||||
ssh -L 6006:localhost:6006 <remote server address>
|
||||
```
|
||||
|
||||
## Nsight
|
||||
|
||||
Nvidia's `Nsight` tools can be used to trace and profile JAX code on GPU. For
|
||||
details, see the [`Nsight`
|
||||
documentation](https://developer.nvidia.com/tools-overview).
|
||||
|
||||
## XLA profiling
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user