1
0
mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-04-25 06:46:05 +00:00
Georgi Gerganov a10b36c91a
llama : refactor kv cache guard ()
* llama : refactor kv cache guard

ggml-ci

* cont : fix comment [no ci]

* llama : fix kv_cache restore logic

ggml-ci

* context : simplify kv cache updates

ggml-ci

* cont : better name [no ci]

* llama : fix llama_decode return code when could not find KV slot

ggml-ci

* context : change log err -> warn [no ci]

* kv-cache : add comment + warning
2025-04-02 14:32:59 +03:00
..

llama.cpp/example/parallel

Simplified simulation of serving incoming requests in parallel