llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-04-17 20:16:09 +00:00

History

llama : DeepSeek V2/V3 MLA implementation (#12801 )

* Merged using squash to remove all noise commit messages

* Force flash attention off for `LLM_ARCH_DEEPSEEK2` - embedding too large

* Removed 3 conts (2x RoPE and 1x RMS-norm)

* Changed to use `<cmath>` instead of `<math.h>`

* Reverted removal of the 3 conts

* Used `reshape` in `llm_graph_context::build_attn_mha()`

* Use `k_pe = ggml_reshape`

* Removed the 3 conts again

* Removed the 3D views of `wk_b` and `wv_b`, and just save and 3D in GGUF

* Removed MQA optimisation from `build_attn_mha()` as no gains now

* Simplified `is_mla` branch in `llm_build_deepseek2()`

* Removed `build_attn_mla` and added `nullptr` to all `build_atnn` calls

* Fixed call to `build_attn` in `llm_build_t5_enc`

2025-04-15 09:49:57 +03:00

scripts

Refactor gguf scripts to improve metadata handling (#11909 )

2025-02-26 08:04:48 -05:00

__init__.py

convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499 )

2024-07-18 20:40:15 +10:00

constants.py

llama : DeepSeek V2/V3 MLA implementation (#12801 )

2025-04-15 09:49:57 +03:00

gguf_reader.py

Refactor gguf scripts to improve metadata handling (#11909 )