llama.cpp/tests at eddfb438502bd5d1014d63a812e9b6d03d326f8c - llama.cpp - Gitea For EOELAB

mirrors/llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-04-15 19:16:09 +00:00

History

Jeff Bolz eddfb43850

vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505 )

* tests: add mul_mat perf/functional tests for p021/nc vulkan shaders

* vulkan: Optimize mul_mat_vec p021 and nc shaders.

These shaders are used in attention calculations, and when the KV cache grows
large they start to dominate the run time. For the nc shader (which is called
with large 'k' dimension), use unrolling and vector loads. For the p021 shader
(which is called with large 'm' and small 'k' dimensions), take advantage of
grouped query attention to reuse loads from the A matrix for the whole group,
and reduce the number of workgroups (too much overhead from tiny dispatches).

Using subgroupAdd in the p021 shader also helps, use that conditionally.

2025-03-22 09:40:11 +01:00

..

.gitignore

tests : gitignore ggml-common.h

2024-03-09 14:17:11 +02:00

CMakeLists.txt

sampling : support for llguidance grammars (#10224 )

2025-02-02 09:55:32 +02:00

get-model.cpp

ci : add model tests + script wrapper (#4586 )

2024-01-26 14:18:00 +02:00

get-model.h

ci : add model tests + script wrapper (#4586 )

2024-01-26 14:18:00 +02:00

run-json-schema-to-grammar.mjs

server : revamp chat UI with vuejs and daisyui (#10175 )

2024-11-07 17:31:10 -04:00

test-arg-parser.cpp

speculative : refactor and add a simpler example (#10362 )

2024-11-25 09:58:41 +02:00

test-autorelease.cpp

llama : add llama_vocab, functions -> methods, naming (#11110 )

2025-01-12 11:32:42 +02:00

test-backend-ops.cpp

vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505 )

2025-03-22 09:40:11 +01:00

test-barrier.cpp

ggml : move CPU backend to a separate file (#10144 )

2024-11-03 19:34:08 +01:00

test-c.c

Nomic Vulkan backend (#4456 )

2024-01-29 15:50:50 -05:00

test-chat-template.cpp

tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900 )

2025-02-18 18:03:23 +00:00

test-chat.cpp

server: extract <think> tags from qwq outputs (#12297 )

2025-03-10 10:59:03 +00:00

test-double-float.cpp

ggml : minor naming changes (#8433 )

2024-07-12 10:46:02 +03:00

test-gguf.cpp

cleanup: fix compile warnings associated with gnu_printf (#11811 )

2025-02-12 10:06:53 -04:00

test-grammar-integration.cpp

sampling : support for llguidance grammars (#10224 )

2025-02-02 09:55:32 +02:00

test-grammar-llguidance.cpp

sampling : support for llguidance grammars (#10224 )

2025-02-02 09:55:32 +02:00

test-grammar-parser.cpp

llama : refactor sampling v2 (#9294 )

2024-09-07 15:16:19 +03:00

test-json-schema-to-grammar.cpp

tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )

2025-03-05 13:05:13 +00:00

test-llama-grammar.cpp

llama : minor grammar refactor (#10897 )

2024-12-19 17:42:13 +02:00

test-log.cpp

common : use common_ prefix for common library functions (#9805 )

2024-10-10 22:57:42 +02:00

test-lora-conversion-inference.sh

ci : use -no-cnv in gguf-split tests (#11254 )

2025-01-15 18:28:35 +02:00

test-model-load-cancel.cpp

llama : update llama_model API names (#11063 )

2025-01-06 10:55:18 +02:00

test-opt.cpp

ggml : inttypes.h -> cinttypes (#0 )

2024-11-17 08:30:29 +02:00

test-quantize-fns.cpp

tests : fix test-quantize-fns to init the CPU backend (#12306 )

2025-03-10 14:07:15 +02:00

test-quantize-perf.cpp

ggml : inttypes.h -> cinttypes (#0 )

2024-11-17 08:30:29 +02:00

test-rope.cpp

llama : add Qwen2VL support + multimodal RoPE (#10361 )

2024-12-14 14:43:46 +02:00

test-sampling.cpp

sampling: add Top-nσ sampler (#11223 )

2025-02-13 08:45:57 +02:00

test-tokenizer-0.cpp

llama : add llama_vocab, functions -> methods, naming (#11110 )

2025-01-12 11:32:42 +02:00

test-tokenizer-0.py

py : logging and flake8 suppression refactoring (#7081 )

2024-05-05 08:07:48 +03:00

test-tokenizer-0.sh

tests : fix test-tokenizer-0.sh

2024-05-28 15:04:09 +03:00

test-tokenizer-1-bpe.cpp

llama : add llama_vocab, functions -> methods, naming (#11110 )

2025-01-12 11:32:42 +02:00

test-tokenizer-1-spm.cpp

llama : add llama_vocab, functions -> methods, naming (#11110 )

2025-01-12 11:32:42 +02:00

test-tokenizer-random.py

llama : add llama_vocab, functions -> methods, naming (#11110 )

2025-01-12 11:32:42 +02:00