mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-04-18 20:46:08 +00:00

History

Xuan-Son Nguyen 267c1399f1

common : refactor downloading system, handle mmproj with -hf option (#12694 )

* (wip) refactor downloading system [no ci]

* fix all examples

* fix mmproj with -hf

* gemma3: update readme

* only handle mmproj in llava example

* fix multi-shard download

* windows: fix problem with std::min and std::max

* fix 2

2025-04-01 23:44:05 +02:00

CMakeLists.txt

ggml : move AMX to the CPU backend (#10570 )

2024-11-29 21:54:58 +01:00

export-lora.cpp

common : refactor downloading system, handle mmproj with -hf option (#12694 )

2025-04-01 23:44:05 +02:00

README.md

export-lora : throw error if lora is quantized (#9002 )

2024-08-13 11:41:14 +02:00

README.md

export-lora

Apply LORA adapters to base model and export the resulting model.

usage: llama-export-lora [options]

options:
  -m,    --model                  model path from which to load base model (default '')
         --lora FNAME             path to LoRA adapter  (can be repeated to use multiple adapters)
         --lora-scaled FNAME S    path to LoRA adapter with user defined scaling S  (can be repeated to use multiple adapters)
  -t,    --threads N              number of threads to use during computation (default: 4)
  -o,    --output FNAME           output file (default: 'ggml-lora-merged-f16.gguf')

For example:

./bin/llama-export-lora \
    -m open-llama-3b-v2.gguf \
    -o open-llama-3b-v2-english2tokipona-chat.gguf \
    --lora lora-open-llama-3b-v2-english2tokipona-chat-LATEST.gguf

Multiple LORA adapters can be applied by passing multiple --lora FNAME or --lora-scaled FNAME S command line parameters:

./bin/llama-export-lora \
    -m your_base_model.gguf \
    -o your_merged_model.gguf \
    --lora-scaled lora_task_A.gguf 0.5 \
    --lora-scaled lora_task_B.gguf 0.5