Yuxuan Zhang
06bb53ad9b
llama-model : add Glm4Model implementation for GLM-4-0414 ( #12867 )
...
* GLM-4-0414
* use original one
* Using with tensor map
* fix bug
* change order
* change order
* format with flask8
2025-04-11 12:10:10 +02:00
Xuan-Son Nguyen
1466621e73
llama : Support llama 4 text-only ( #12791 )
...
* llama4 conversion
* initial support, no chat template
* clean up a bit
* fix tokenizer conversion
* correct hparams
* try this
* fix shexp
* ffn_inp_normed
* chat template
* clean up model conversion
* add_bos
* add scale_before_ffn
* fix order
* weight_before_ffn
* llm_graph_input_attn_temp
* add chunk attn mask
* build_inp_attn_scale()
* add comment about ggml_repeat
* clarify comments
* fix build
2025-04-07 23:06:44 +02:00
Sigbjørn Skjæret
2c3f8b850a
llama : support BailingMoE (Ling) ( #12634 )
2025-03-30 22:21:03 +02:00
Juyoung Suk
b3de7cac73
llama : add Trillion 7B model support ( #12556 )
...
* Support Trillion 7B
* Update llama.h
* Update llama.h
* Update llama-vocab.cpp for Trillion
* Update llama-vocab.cpp
2025-03-30 20:38:33 +02:00
compilade
00d53800e0
llama-vocab : add SuperBPE pre-tokenizer ( #12532 )
2025-03-24 11:47:24 +01:00
Xuan-Son Nguyen
c43a3e7996
llama : add Phi-4-mini support (supersede #12099 ) ( #12108 )
...
* Added Phi-4-mini-instruct support
* Update regex per ngxson
* Change the vocab base to Xenova/gpt-4o
* fix conversion update script
* no need to check longrope
* minor style fix
* fix python style
---------
Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>
2025-02-28 12:44:11 +01:00
Georgi Gerganov
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model ( #11310 )
...
* llama : add support for Deepseek-R1-Qwen distill model
* coding style
2025-01-20 14:35:07 +01:00
fairydreaming
9394bbd484
llama : Add support for DeepSeek V3 ( #11049 )
...
* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-01-04 21:06:11 +01:00
Yun Dou
b92a14a841
llama : support InfiniAI Megrez 3b ( #10893 )
...
* Support InfiniAI Megrez 3b
* Fix tokenizer_clean_spaces for megrez
2024-12-23 01:35:44 +01:00
Billel Mokeddem
7ae33a616f
llama : add Falcon3 support ( #10883 )
...
* Add Falcon3 model support
* Add fix for adding bos to added special tokens
* Add comment explaining the logic behind the if statement
* Add a log message to better track the when the following line of code is triggered
* Update log to only print when input and output characters are different
* Fix handling pre-normalized tokens
* Refactoring
2024-12-23 00:09:58 +02:00
Diego Devesa
4da69d1abd
Revert "llama : add Falcon3 support ( #10864 )" ( #10876 )
...
This reverts commit 382bc7f2e8ffd0b89f23e840d097e21f301197ba.
2024-12-18 01:36:46 +01:00
Billel Mokeddem
382bc7f2e8
llama : add Falcon3 support ( #10864 )
2024-12-17 17:24:56 +02:00
Valentin Mamedov
a0974156f3
llama : add Deepseek MoE v1 & GigaChat models ( #10827 )
...
* Add deepseek v1 arch & gigachat template
* improve template code
* add readme
* delete comments
* remove comment
* fix format
* lint llama.cpp
* fix order of deepseek and deepseek2, move gigachat temlate to the end of func
* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need
* remove comments
* move deepseek above deepseek2
* change placement of gigachat chat template
2024-12-15 19:02:46 +02:00
Sukriti Sharma
784a14aa49
convert : add support for Roberta embeddings ( #10695 )
2024-12-07 09:02:14 +02:00
Riccardo Orlando
6fe6247831
llama : add Minerva 7B model support ( #10673 )
...
* Support for Minerva 7B
* Update convert_hf_to_gguf_update.py
2024-12-05 20:30:59 +02:00
Daniel Bevenius
d405804be8
py : update outdated copy-paste instructions [no ci] ( #10667 )
...
This commit updates the copy-paste instruction in
convert_hf_to_gguf_update.py to reflect that convert_hf_to_gguf.py
will have already been updated with the new get_vocab_base_pre()
function when this script completes.
2024-12-05 09:47:55 +02:00
Georgi Gerganov
bc5ba007b2
server : check that the prompt fits in the slot's context ( #10030 )
...
ggml-ci
2024-10-25 10:13:46 +03:00
Georgi Gerganov
f4d2b8846a
llama : add reranking support ( #9510 )
...
* py : add XLMRobertaForSequenceClassification [no ci]
* py : fix scalar-tensor conversion [no ci]
* py : fix position embeddings chop [no ci]
* llama : read new cls tensors [no ci]
* llama : add classigication head (wip) [no ci]
* llama : add "rank" pooling type
ggml-ci
* server : add rerank endpoint
ggml-ci
* llama : aboud ggml_repeat during classification
* rerank : cleanup + comments
* server : accept /rerank endpoint in addition to /v1/rerank [no ci]
* embedding : parse special tokens
* jina : support v1 reranker
* vocab : minor style
ggml-ci
* server : initiate tests for later
ggml-ci
* server : add docs
* llama : add comment [no ci]
* llama : fix uninitialized tensors
* ci : add rerank tests
ggml-ci
* add reranking test
* change test data
* Update examples/server/server.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* add `--reranking` argument
* update server docs
* llama : fix comment [no ci]
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-09-28 17:42:03 +03:00
nopperl
9a913110cf
llama : add support for Chameleon ( #8543 )
...
* convert chameleon hf to gguf
* add chameleon tokenizer tests
* fix lint
* implement chameleon graph
* add swin norm param
* return qk norm weights and biases to original format
* implement swin norm
* suppress image token output
* rem tabs
* add comment to conversion
* fix ci
* check for k norm separately
* adapt to new lora implementation
* fix layer input for swin norm
* move swin_norm in gguf writer
* add comment regarding special token regex in chameleon pre-tokenizer
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* fix punctuation regex in chameleon pre-tokenizer (@compilade)
Co-authored-by: compilade <git@compilade.net>
* fix lint
* trigger ci
---------
Co-authored-by: compilade <git@compilade.net>
2024-09-28 15:08:43 +03:00
daminho
c837981bba
py : add Phi-1.5/Phi-2 tokenizer ( #9361 )
...
* add phi2 tokenizer
* add phi name to convert_hf_to_gguf_update.py
* make tokenizer_pre consistent; llama.cpp work
2024-09-12 14:28:20 +03:00
Pavel Zloi
8db003a19d
py : support converting local models ( #7547 )
...
* Support of converting local models added to convert-hf-to-gguf-update.py
* Description fixed
* shutil added to imports
2024-09-11 15:29:51 +03:00
Minsoo Cheong
c679e0cb5c
llama : add EXAONE model support ( #9025 )
...
* add exaone model support
* add chat template
* fix whitespace
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add ftype
* add exaone pre-tokenizer in `llama-vocab.cpp`
Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com>
* fix lint
Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com>
* add `EXAONE` to supported models in `README.md`
* fix space
Co-authored-by: compilade <git@compilade.net>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <113953597+compilade@users.noreply.github.com>
Co-authored-by: compilade <git@compilade.net>
2024-08-16 09:35:18 +03:00
Esko Toivonen
6bda7ce6c3
llama : add pre-tokenizer regexes for BLOOM and gpt3-finnish ( #8850 )
2024-08-15 10:17:12 +03:00
Keke Han
081fe431aa
llama : fix codeshell support ( #8599 )
...
* llama : fix codeshell support
* llama : move codeshell after smollm below to respect the enum order
2024-07-22 19:43:43 +03:00
Jason Stillerman
d94c6e0ccb
llama : add support for SmolLm pre-tokenizer ( #8609 )
...
* Adding SmolLM Pre Tokenizer
* Update convert_hf_to_gguf_update.py
Co-authored-by: compilade <git@compilade.net>
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* handle regex
* removed .inp and out .out ggufs
---------
Co-authored-by: compilade <git@compilade.net>
2024-07-22 17:43:01 +03:00
Jiří Podivín
566daa5a5b
*.py: Stylistic adjustments for python ( #8233 )
...
* Superflous parens in conditionals were removed.
* Unused args in function were removed.
* Replaced unused `idx` var with `_`
* Initializing file_format and format_version attributes
* Renaming constant to capitals
* Preventing redefinition of the `f` var
Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
2024-07-22 23:44:53 +10:00
Michael Coppola
940362224d
llama : add support for Tekken pre-tokenizer ( #8579 )
...
* llama : Added support for Tekken pre-tokenizer (#8577 )
Removed uneeded `vocab.tokenizer_clean_spaces` assignment
* llama : fix order of pre-tokenizers
* * Tekken pre-tokenizer no longer uses clean_up_tokenization_spaces
* Updated chkhsh for Tekken tokenizer
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-20 16:43:51 +03:00
Georgi Gerganov
e235b267a2
py : switch to snake_case ( #8305 )
...
* py : switch to snake_case
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* cont : fix link
* gguf-py : use snake_case in scripts entrypoint export
* py : rename requirements for convert_legacy_llama.py
Needed for scripts/check-requirements.sh
---------
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-07-05 07:53:33 +03:00
ditsuke
01a5f06550
chore: Remove rebase artifacts
2024-07-04 15:39:13 +00:00
ditsuke
b0a46993df
build(python): Package scripts with pip-0517 compliance
2024-07-04 15:39:13 +00:00