mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-04-21 05:56:10 +00:00

History

Georgi Gerganov fcca0a7004

refact : fix convert script + zero out KV cache to avoid nans (#3523 )

* refact : fix convert script + zero out KV cache to avoid nans

* ggml : silu(-inf) should never happen

* metal : assert various kernel requirements

2023-10-09 14:32:17 +03:00

CMakeLists.txt

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

parallel.cpp

refact : fix convert script + zero out KV cache to avoid nans (#3523 )

2023-10-09 14:32:17 +03:00

README.md

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

README.md

llama.cpp/example/parallel

Simplified simluation for serving incoming requests in parallel