rpc : update README for cache usage (#12620)

This commit is contained in:
Radoslav Gerganov 2025-03-28 09:44:13 +02:00 committed by GitHub
parent 13731766db
commit ef03229ff4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -72,3 +72,14 @@ $ bin/llama-cli -m ../models/tinyllama-1b/ggml-model-f16.gguf -p "Hello, my name
This way you can offload model layers to both local and remote devices.
### Local cache
The RPC server can use a local cache to store large tensors and avoid transferring them over the network.
This can speed up model loading significantly, especially when using large models.
To enable the cache, use the `-c` option:
```bash
$ bin/rpc-server -c
```
By default, the cache is stored in the `$HOME/.cache/llama.cpp/rpc` directory and can be controlled via the `LLAMA_CACHE` environment variable.