Sigbjørn Skjæret
60c902926c
docs : bring llama-cli conversation/template docs up-to-date ( #12426 )
2025-03-17 21:14:32 +01:00
Gaurav Garg
b1b132efcb
cuda : enable CUDA Graph on CUDA Toolkit < 12.x ( #12394 )
...
* Enable CUDA Graph on CTK < 12.x
`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.
* Fix compilation errors with MUSA
* Disable CUDA Graph for MUSA
b4905
2025-03-17 20:25:13 +02:00
Guus Waals
01e8f2138b
ggml-vulkan: remove unused find_program(glslc) ( #12416 )
...
It's already found by FindVulkan.cmake in the parent CMakeLists
2025-03-17 13:35:43 -03:00
Jeff Bolz
484a8ab513
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader ( #12312 )
b4903
2025-03-17 09:26:18 -05:00
Daniele
cf2270e4d3
vulkan: subgroup size tuning ( #12087 )
...
* vulkan: subgroup size test
* Vulkan: Add device architecture enum and logic to recognize AMD generations
* vulkan: use new architecture logic to specify subgroup size
* Initial vulkan subgroup size tuning for RDNA3
* vulkan: commonize RDNA subgroup tuning
* vulkan: override subgroup size if required_subgroup_size = 0
* vulkan: disable warp 32 for RDNA3
* vulkan: fine tuned RDNA1 subgroup sizes
* vulkan: adjusted subgroup size map
* vulkan: fixed RDNA2 subgroup map
---------
Co-authored-by: 0cc4m <picard12@live.de>
b4902
2025-03-17 12:42:33 +01:00
Jeff Bolz
f07690c930
vulkan: use fp32 in coopmat2 q4_k dequant function ( #12309 )
b4901
2025-03-17 10:43:35 +01:00
Jeff Bolz
891c63956d
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking ( #12273 )
...
* vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking
b4900
2025-03-17 10:41:59 +01:00
Jeff Bolz
2f21123c1d
vulkan: Adjust coopmat2 tile sizes and selection heuristic ( #12258 )
b4899
2025-03-17 10:35:00 +01:00
Christian Kastner
374101fd74
cmake : enable building llama.cpp using system libggml ( #12321 )
...
* cmake: Factor out compiler flag function from ggml
llama.cpps's build requires it, too, and we may want to make use of it
without add_subdirectory(ggml).
* cmake: Enable building against system ggml
This facilitates package maintenance for Linux distributions, where the
libggml library most likely will be shipped as an individual package
upon which a llama.cpp package depends.
b4898
2025-03-17 11:05:23 +02:00
Akarshan Biswas
b3c9a65673
SYCL: set extras only on GGML_TYPE_Q4_0 ( #12366 )
...
* SYCL: set extras only on GGML_TYPE_Q4_0
* release tensor_extras in reset buffer interface
b4897
2025-03-17 09:45:12 +08:00
Sigbjørn Skjæret
8ba95dca20
llama : fix OLMo-2-0325-32B-Instruct K-norm size ( #12400 )
b4896
2025-03-16 19:46:36 +02:00
Georgi Gerganov
dc079cfdff
context : fix init of n_outputs ( #12397 )
...
ggml-ci
b4895
2025-03-16 19:29:36 +02:00
Daniel Bevenius
7b61bcc87c
ci : add --symlinks to xcframework zip command ( #12409 )
...
This commit adds the --symlinks option to the zip command used to create
the xcframework zip file. This is necessary to create symlinks in the
zip file. Without this option, the Versions symlink is stored as a
regular directory entry in the zip file, rather than as a symlink in the
zip which causes the followig error in xcode:
```console
Couldn't resolve framework symlink for '/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current': readlink(/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current): Invalid argument (22)
```
Refs: https://github.com/ggml-org/llama.cpp/pull/11996#issuecomment-2727026377
2025-03-16 18:22:05 +01:00
marcoStocchi
f4c3dd5daa
llama-tts : add '-o' option ( #12398 )
...
* added -o option to specify an output file name
* llama-tts returns ENOENT in case of file write error
note : PR #12042 is closed as superseded with this one.
b4893
2025-03-15 17:23:11 +01:00
aubreyli
3d35d87b41
SYCL: Delete redundant plus sign and space ( #12391 )
b4892
2025-03-15 15:49:03 +01:00
fairydreaming
b19bd064c0
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) ( #12399 )
...
* sycl : support non-contiguous tensors in binary ops
* sycl : silence unused variable warning
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
b4891
2025-03-15 22:19:30 +08:00
Chenguang Li
92a391327e
[CANN]MUL_MAT optimization ( #12382 )
2025-03-15 09:31:08 +08:00
Eric Curtin
9f2250ba72
Add CLI arg to llama-run to adjust the number of threads used ( #12370 )
...
We default to 4, sometimes we want to manually adjust this
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
b4889
2025-03-14 16:41:20 +00:00
Sigbjørn Skjæret
774973b8f3
main : add -sysf / --system-prompt-file ( #12249 ) ( #12250 )
...
* add system_prompt_file
* add -sysf / --system-prompt-file
* remove system_prompt_file
b4888
2025-03-14 16:57:05 +01:00
fairydreaming
8fcb563613
Load all MoE experts during warmup ( #11571 )
...
* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup
* common : use new API to enable warmup mode during model warmup
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-14 13:47:05 +01:00
Victor
add2a3aa5a
server: fix "--grammar-file" parameter ( #12285 )
b4886
2025-03-14 11:21:17 +01:00
Georgi Gerganov
c522ce4143
graph : simplify attn input build for unified KV cache ( #12381 )
...
ggml-ci
b4885
2025-03-14 10:47:44 +02:00
Georgi Gerganov
081bee8c64
hparams : add SWA rope parameters ( #12374 )
...
ggml-ci
b4884
2025-03-14 09:03:24 +02:00
Georgi Gerganov
84d5475541
llama : fix Gemma3 SWA KV cache shift ( #12373 )
...
* llama : fix Gemma3 SWA KV cache shift
ggml-ci
* hparams : add comment [no ci]
2025-03-13 19:08:07 +02:00
Xuan-Son Nguyen
be7c303410
arg : no n_predict = -2 for examples except for main and infill ( #12364 )
b4882
2025-03-13 12:34:54 +01:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )
...
* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci
2025-03-13 12:35:44 +02:00
Ishaan Gandhi
2048b5913d
server : fix crash when using verbose output with input tokens that are not in printable range ( #12178 ) ( #12338 )
...
* Fix DOS index bug
* Remove new APIs
* remove extra line
* Remove from API
* Add extra newline
* Update examples/server/server.cpp
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
b4880
2025-03-13 11:10:05 +01:00
Oscar Barenys
f08f4b3187
Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support ( #12301 )
b4879
2025-03-12 20:06:58 +01:00
Daniel Bevenius
80a02aa858
llama.swiftui : fix xcframework dir in README [no ci] ( #12353 )
...
This commit fixes the path to the xcframework in the README file which I
had forgotten to change after renaming the build directory.
2025-03-12 13:45:32 +01:00
Alberto Cabrera Pérez
363f8c5d67
sycl : variable sg_size support for mmvq kernels ( #12336 )
b4877
2025-03-12 09:57:32 +00:00
uvos
34c961b181
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 ( #12315 )
...
When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64
b4876
2025-03-12 10:14:11 +01:00
Xuan-Son Nguyen
7841fc723e
llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )
...
* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64
b4875
2025-03-12 09:30:24 +01:00
Jeff Bolz
bf69cfe62f
vulkan: fix bug in coopmat1 mul_mat_id ( #12316 )
...
* tests: run mul_mat_id with a larger N
* vulkan: fix bug in coopmat1 mul_mat_id
b4874
2025-03-12 06:59:19 +01:00
uvos
10f2e81809
CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. ( #12177 )
...
refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
b4873
2025-03-11 20:16:03 +01:00
jklincn
ba7654380a
ggml-backend : fix backend search path ( #12330 )
...
* Fix backend search path
* replace .native() with '/'
* reverted .native()
b4872
2025-03-11 14:25:17 +01:00
BB-fat
6ab2e4765a
metal : Cache the Metal library at the device context level ( #12265 )
b4871
2025-03-11 13:45:02 +02:00
Xuan-Son Nguyen
96e1280839
clip : bring back GPU support ( #12322 )
...
* clip : bring back GPU support
* use n_gpu_layers param
* fix double free
* ggml_backend_init_by_type
* clean up
b4870
2025-03-11 09:20:16 +01:00
Eve
2c9f833d17
mat vec double buffer ( #12188 )
b4869
2025-03-10 19:28:11 +00:00
R0CKSTAR
251364549f
musa: support new arch mp_31 and update doc ( #12296 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
b4868
2025-03-10 18:18:25 +01:00
Henry Linjamäki
8acdacb3ea
opencl: use OpenCL C standard supported by the device ( #12221 )
...
This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.
b4867
2025-03-10 09:57:00 -07:00
John Bean
89b2b56e86
readme: added Sidekick to available UIs ( #12311 )
2025-03-10 16:13:09 +02:00
Georgi Gerganov
e128a1bf5b
tests : fix test-quantize-fns to init the CPU backend ( #12306 )
...
ggml-ci
b4865
2025-03-10 14:07:15 +02:00
marcoStocchi
6ef79a67ca
common : refactor '-o' option ( #12278 )
...
As discussed in PR 'llama-tts : add -o option' (#12042 ):
* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.
* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.
b4864
2025-03-10 13:34:13 +02:00
Olivier Chafik
4e39a3c332
server
: extract <think> tags from qwq outputs (#12297 )
...
* extract <think> tags from qwq outputs
* const for all static regexes in chat.cpp
b4863
2025-03-10 10:59:03 +00:00
Olivier Chafik
be421fc429
tool-call
: ensure there's always a non-empty tool call id (#12292 )
2025-03-10 09:45:29 +00:00
Olivier Chafik
87c2630546
allow missing content in message if tool_calls provided ( #12293 )
b4861
2025-03-10 09:45:07 +00:00
Olivier Chafik
2b3a25c212
sampler
: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291 )
...
* Fix typo in lazy grammar handling (fixes trigger tokens)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b4860
2025-03-10 09:44:42 +00:00
tc-mb
8352cdc87b
llava : fix bug in minicpm-v code ( #11513 )
...
* fix bug in minicpm-v code
* update readme of minicpm-v
b4859
2025-03-10 10:33:24 +02:00
Georgi Gerganov
1e2f78a004
server : add speculative decoding presets for FIM ( #12287 )
2025-03-09 19:08:20 +02:00
Georgi Gerganov
0fd7ca7a21
authors : update ( #12271 )
2025-03-08 18:26:00 +02:00