mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-04-21 14:06:08 +00:00

* refactor llama_batch_get_one * adapt all examples * fix simple.cpp * fix llama_bench * fix * fix context shifting * free batch before return * use common_batch_add, reuse llama_batch in loop * null terminated seq_id list * fix save-load-state example * fix perplexity * correct token pos in llama_batch_allocr
llama.cpp/example/parallel
Simplified simulation of serving incoming requests in parallel