mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-04-26 05:56:07 +00:00

This commit fixes an issue in the llama.cpp project where the command for testing the llama-server object contained a duplicated file extension. The original command was: ./tests.sh unit/test_chat_completion.py.py -v -x It has been corrected to: ./tests.sh unit/test_chat_completion.py -v -x This change ensures that the test script correctly locates and executes the intended test file, preventing test failures due to an incorrect file name.
67 lines
2.3 KiB
Markdown
67 lines
2.3 KiB
Markdown
# Server tests
|
|
|
|
Python based server tests scenario using [pytest](https://docs.pytest.org/en/stable/).
|
|
|
|
Tests target GitHub workflows job runners with 4 vCPU.
|
|
|
|
Note: If the host architecture inference speed is faster than GitHub runners one, parallel scenario may randomly fail.
|
|
To mitigate it, you can increase values in `n_predict`, `kv_size`.
|
|
|
|
### Install dependencies
|
|
|
|
`pip install -r requirements.txt`
|
|
|
|
### Run tests
|
|
|
|
1. Build the server
|
|
|
|
```shell
|
|
cd ../../..
|
|
cmake -B build -DLLAMA_CURL=ON
|
|
cmake --build build --target llama-server
|
|
```
|
|
|
|
2. Start the test: `./tests.sh`
|
|
|
|
It's possible to override some scenario steps values with environment variables:
|
|
|
|
| variable | description |
|
|
|--------------------------|------------------------------------------------------------------------------------------------|
|
|
| `PORT` | `context.server_port` to set the listening port of the server during scenario, default: `8080` |
|
|
| `LLAMA_SERVER_BIN_PATH` | to change the server binary path, default: `../../../build/bin/llama-server` |
|
|
| `DEBUG` | to enable steps and server verbose mode `--verbose` |
|
|
| `N_GPU_LAYERS` | number of model layers to offload to VRAM `-ngl --n-gpu-layers` |
|
|
| `LLAMA_CACHE` | by default server tests re-download models to the `tmp` subfolder. Set this to your cache (e.g. `$HOME/Library/Caches/llama.cpp` on Mac or `$HOME/.cache/llama.cpp` on Unix) to avoid this |
|
|
|
|
To run slow tests (will download many models, make sure to set `LLAMA_CACHE` if needed):
|
|
|
|
```shell
|
|
SLOW_TESTS=1 ./tests.sh
|
|
```
|
|
|
|
To run with stdout/stderr display in real time (verbose output, but useful for debugging):
|
|
|
|
```shell
|
|
DEBUG=1 ./tests.sh -s -v -x
|
|
```
|
|
|
|
To run all the tests in a file:
|
|
|
|
```shell
|
|
./tests.sh unit/test_chat_completion.py -v -x
|
|
```
|
|
|
|
To run a single test:
|
|
|
|
```shell
|
|
./tests.sh unit/test_chat_completion.py::test_invalid_chat_completion_req
|
|
```
|
|
|
|
Hint: You can compile and run test in single command, useful for local developement:
|
|
|
|
```shell
|
|
cmake --build build -j --target llama-server && ./examples/server/tests/tests.sh
|
|
```
|
|
|
|
To see all available arguments, please refer to [pytest documentation](https://docs.pytest.org/en/stable/how-to/usage.html)
|