llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-04-19 13:06:10 +00:00

History

vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (#12630 )

There seems to be a bubble waking up from waitForFences, which costs a few
percent performance and also increased variance in performance. This change
inserts an "almost_ready" fence when the graph is about 80% complete and we
waitForFences for the almost_ready fence and then spin (with _mm_pauses) waiting
for the final fence to be signaled.

2025-04-04 07:54:35 +02:00

cmake

scripts : update sync + fix cmake merge

2025-03-27 10:09:29 +02:00

include

metal : improve FA + improve MoE (#12612 )

2025-03-28 20:21:59 +02:00

src

vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (#12630 )

2025-04-04 07:54:35 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : add logging for native build options/vars (whisper/2935)

2025-03-30 08:33:31 +03:00