1
0
mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-04-19 21:16:06 +00:00

7 Commits

Author SHA1 Message Date
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context ()
* llama : refactor llama_context, llama_kv_cache, llm_build_context

ggml-ci

* graph : don't mutate the KV cache during defrag

ggml-ci

* context : reduce virtuals + remove test function

ggml-ci

* context : move interface implementation to source file + factory

ggml-ci

* graph : move KV cache build functions to llama_context impl

ggml-ci

* graph : remove model reference from build_pooling

ggml-ci

* graph : remove llama_model reference

ggml-ci

* kv_cache : provide rope factors

ggml-ci

* graph : rework inputs to use only unique_ptr, remove attn input abstraction

ggml-ci

* context : remove llama_context_i abstraction

ggml-ci

* context : clean-up

ggml-ci

* graph : clean-up

ggml-ci

* llama : remove redundant keywords (struct, enum)

ggml-ci

* model : adapt gemma3

ggml-ci

* graph : restore same attention ops as on master

ggml-ci

* llama : remove TODO + fix indent

ggml-ci
2025-03-13 12:35:44 +02:00
mgroeber9110
5bbe6a9fe9
ggml : portability fixes for VS 2017 ()
* Add include files for std::min/max and std::toupper/tolower

* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined

* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode

* win32: only use __restrict in MSVC if C11/C17 support is not enabled

---------

Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>
2025-03-04 18:53:26 +02:00
Georgi Gerganov
abd4d0bc4f
speculative : update default params ()
* speculative : update default params

* speculative : do not discard the last drafted token
2025-02-19 13:29:42 +02:00
Georgi Gerganov
afa8a9ec9b
llama : add llama_vocab, functions -> methods, naming ()
* llama : functions -> methods ()

* llama : add struct llama_vocab to the API ()

ggml-ci

* hparams : move vocab params to llama_vocab ()

ggml-ci

* vocab : more pimpl ()

ggml-ci

* vocab : minor tokenization optimizations ()

ggml-ci

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* lora : update API names ()

ggml-ci

* llama : update API names to use correct prefix ()

* llama : update API names to use correct prefix

ggml-ci

* cont

ggml-ci

* cont

ggml-ci

* minor [no ci]

* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os ()

ggml-ci

* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens ()

ggml-ci

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-12 11:32:42 +02:00
Georgi Gerganov
c2a16c0bdb
server : fix free of spec context and batch ()
ggml-ci
2024-12-07 11:52:44 +02:00
Georgi Gerganov
9fd8c2687f
server : add more information about error () 2024-11-25 22:28:59 +02:00
Georgi Gerganov
d9d54e498d
speculative : refactor and add a simpler example ()
* speculative : refactor and add a simpler example

ggml-ci

* speculative : clean-up and add comments and TODOs [no ci]

* speculative : manage context in common_speculative

ggml-ci

* speculative : simplify

ggml-ci

* speculative : simplify (cont)

ggml-ci

* speculative : add --draft-min CLI arg

* speculative : minor fixup

* make : build fixes

* speculative : do not redraft previous drafts

ggml-ci

* speculative : fix the draft sampling

ggml-ci

* speculative : fix compile warning

* common : refactor args

ggml-ci

* common : change defaults [no ci]

* common : final touches

ggml-ci
2024-11-25 09:58:41 +02:00