1
0
mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-04-22 14:06:07 +00:00

4865 Commits

Author SHA1 Message Date
Georgi Gerganov
e128a1bf5b
tests : fix test-quantize-fns to init the CPU backend ()
ggml-ci
b4865
2025-03-10 14:07:15 +02:00
marcoStocchi
6ef79a67ca
common : refactor '-o' option ()
As discussed in PR 'llama-tts : add -o option' ():

* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.

* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.
b4864
2025-03-10 13:34:13 +02:00
Olivier Chafik
4e39a3c332
server: extract <think> tags from qwq outputs ()
* extract <think> tags from qwq outputs

* const for all static regexes in chat.cpp
b4863
2025-03-10 10:59:03 +00:00
Olivier Chafik
be421fc429
tool-call: ensure there's always a non-empty tool call id () 2025-03-10 09:45:29 +00:00
Olivier Chafik
87c2630546
allow missing content in message if tool_calls provided () b4861 2025-03-10 09:45:07 +00:00
Olivier Chafik
2b3a25c212
sampler: fixes trigger tokens + lazy grammars (fix typo cast from token to string) ()
* Fix typo in lazy grammar handling (fixes trigger tokens)

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b4860
2025-03-10 09:44:42 +00:00
tc-mb
8352cdc87b
llava : fix bug in minicpm-v code ()
* fix bug in minicpm-v code

* update readme of minicpm-v
b4859
2025-03-10 10:33:24 +02:00
Georgi Gerganov
1e2f78a004
server : add speculative decoding presets for FIM () 2025-03-09 19:08:20 +02:00
Georgi Gerganov
0fd7ca7a21
authors : update () 2025-03-08 18:26:00 +02:00
Jason C.H
6fefc05a7a
ggml-backend : make path_str compatible with C++20 () b4856 2025-03-08 17:02:39 +01:00
Georgi Gerganov
7ab364390f
server : infill gen ends on new line () b4855 2025-03-07 20:54:30 +02:00
Daniel Bevenius
7c7f3b7f43
ggml : skip intermediate .air file when compiling .metallib ()
This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.

The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.
b4854
2025-03-07 14:15:27 +01:00
Georgi Gerganov
102ac1891d sync : ggml
ggml-ci
b4853
2025-03-07 14:49:44 +02:00
vmobilis
d6ae2fa061 ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118)
* ggml_compute_forward_concat() for arbitrary tensor type

* Check that tensors' type match

* ggml-cpu.c: check type of source tensors

* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()

* ggml.c: check concatenated tensor type

* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c

..., as it was moved to ggml.c.
2025-03-07 14:49:44 +02:00
Rémy O
68d0027f3d
ggml-cpu: faster AVX2 variant for IQ1_M () b4851 2025-03-07 13:54:22 +02:00
Georgi Gerganov
ea002810a2
ci : fix save-load test invocations () 2025-03-07 12:19:31 +02:00
Sigbjørn Skjæret
8fad3c7a7c
server : Log original chat template parsing error () b4849 2025-03-07 11:15:33 +01:00
Olivier Chafik
7cf64f6bee
sync: minja - support QwQ-32B ()
8a76f7815e
b4848
2025-03-07 09:33:37 +00:00
BB-fat
5e2d57b2b2
metal : simplify kernel arguments using a struct () ()
* metal : refactor im2col parameters into a struct

* metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets

* metal : refactor sum_rows parameters into a struct

* metal : refactor soft_max parameters into a struct

* metal : refactor diag_mask_inf parameters into a struct

* metal : refactor ssm_conv parameters into a struct

* metal : refactor ssm_scan parameters into a struct

* metal : refactor get_rows parameters into a struct

* metal : refactor group_norm parameters into a struct

* metal : refactor conv_transpose_1d parameters into a struct

* metal : refactor upscale parameters into a struct

* metal : refactor pad parameters into a struct

* metal : refactor pad_reflect_1d parameters into a struct

* metal : refactor arange parameters into a struct

* metal : refactor timestep_embedding parameters into a struct

* metal : refactor argsort parameters into a struct

* metal : refactor leaky_relu parameters into a struct

* metal : refactor pool_2d parameters into a struct

* metal : fix trailing whitespace

---------

Co-authored-by: alexju <alexju@tencent.com>
b4847
2025-03-07 08:35:57 +01:00
David Huang
f1648e91cf
HIP: fix rocWMMA build flags under Windows () b4846 2025-03-07 08:06:08 +01:00
Daniel Bevenius
d6c95b0740
metal : fix default.metallib build ()
This commit updates the custom command to build the default.metallib
file to use the correct path to ../ggml-common.h by using the variable
METALLIB_COMMON.

The motivation for this change is that currently when building and
specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is
generated:
```console
[ 11%] Linking CXX shared library ../../bin/libggml.dylib
[ 11%] Built target ggml
make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'.  Stop.
make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2
```

With the above change the build could progress but there was a follow
on error about not being able to find the ggml-common.h file in
ggml-metal.metal where is was included as a relative path:
```console
[ 11%] Compiling Metal kernels
/Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'?
         ^~~~~~~~~~~~~~~~~~
         "ggml-common.h"
1 error generated.
```
Removing the relative path then allowed the build to complete
successfully.
2025-03-07 06:23:16 +01:00
lhez
d76a86d967
opencl: Noncontiguous norm, rms_norm, disable fp16 for some ops ()
* opencl: support noncontiguous `norm`

* opencl: support noncontiguous `rms_norm`

* opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`
2025-03-07 00:20:35 +00:00
xiaofei
776f9e59cc
cmake : fix undefined reference errors for std::filesystem in ggml () ()
Signed-off-by: Ray Lee <hburaylee@gmail.com>
Co-authored-by: Ray Lee <hburaylee@gmail.com>
2025-03-06 22:58:25 +00:00
Lucas Moura Belo
3d652bfddf
readme : update bindings () 2025-03-06 21:15:13 +02:00
Johannes Gäßler
5220a16d18
CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 () 2025-03-06 18:45:09 +01:00
David Huang
3ffbbd5ce1
HIP: rocWMMA documentation and enabling in workflow builds ()
* Enable rocWMMA for Windows CI build

* Enable for Ubuntu

* GGML_HIP_ROCWMMA_FATTN documentation work
2025-03-06 14:14:11 +01:00
Olivier Chafik
42994048a3
update function-calling.md w/ template override for functionary-small-v3.2 () 2025-03-06 09:03:31 +00:00
Aaron Teo
e9b2f84f14
llava: add big-endian conversion for image encoder ()
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-03-06 09:33:21 +01:00
uvos
e721c05c93
HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. ()
This avoids conflict with internal cuda/hip runtimes memory managment behavior.
b4837
2025-03-06 08:20:52 +01:00
Han Yin
57b6abf85a
android : fix KV cache log message condition () b4836 2025-03-06 08:22:49 +02:00
Henry Linjamäki
94bb63e4f0
opencl : fix buffer alignment ()
Fix the following error:

```
ggml-alloc.c:99: not enough space in the buffer
ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024)
```

which occurs when `ggml_backend_opencl_context::alignment` is larger
than `cl_ptr_base` (hard-coded to `0x1000`).

Also, fix `ggml_backend_opencl_context::alignment` was set to
`CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the
value is reported in bits.
b4835
2025-03-06 02:33:40 +01:00
Henry Linjamäki
f79243992c
opencl : fix ulong kernel args were set from int variables ()
... which left garbage bits in the upper half of the kernel args. This
caused segmentation faults when running PoCL.
b4834
2025-03-06 02:31:14 +01:00
simon886212
ed4ce0dda2
opencl : fix profile-related errors ()
Co-authored-by: ubuntu <ubuntu@localhost.localdomain>
b4833
2025-03-06 02:30:05 +01:00
Rémy O
07d1572347
ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions ()
* ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions

* cmake: Add GGML_BMI2 build option

* ggml: enable BMI2 on relevant CPU variants

* ggml-cpu: include BMI2 in backend score

* ggml-cpu: register BMI2 in ggml_backend_cpu_get_features

* ggml-cpu: add __BMI2__ define when using MSVC
b4832
2025-03-06 02:26:10 +01:00
Akarshan Biswas
5e43f104cc
SYCL: Disable f16 Unary OPs as not supported by the kernels () b4831 2025-03-05 16:58:23 +01:00
Plamen Minev
16e4b22c5e
ggml : fix GGMLMetalClass ODR ()
-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.
b4830
2025-03-05 17:16:01 +02:00
Daniel Bevenius
074c4fd39d
ci : add fetch-depth to xcframework upload ()
This commit adds the fetch-depth: 0 option to the checkout action in the
build.yml workflow file (0 meaning that it fetches the complete
history). The default value is 1 when not specified which only fetches
the latest commit.

This is necessary to ensure that `git rev-list --count HEAD` counts the
total number of commits in the history. Currently because the default is
being used the name of the xcframework artifact is always
llama-b1-xcframework.
b4829
2025-03-05 14:16:40 +01:00
Olivier Chafik
669912d9a5
tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars ()
* sampler: turn lazy grammar trigger words to regexes

* add scripts/tool_bench.sh & .py

* constrain llama json output regardless of function name if matches at beginning

* update relaxed newline space rule in grammar tests

* support add_generation_prompt query parameter (useful for /apply_template)

* Update src/llama-grammar.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-05 13:05:13 +00:00
Daniel Bevenius
fa31c438e0
ci : fix xcframework artifact tag ()
The commit add the name parameter to the upload-artifact action to
ensure that the artifact is uploaded with the correct name.

The motivation for this is that currently the uploaded xcframework
is named as llama-b1-xcframework.zip. With this change the name of this
artifact should contain the build number like the other artifacts.
b4827
2025-03-05 10:22:29 +01:00
Daniel Bevenius
3ccbfe5a71
ci : remove xframework upload ()
* ci : remove xframework upload

This commit removes the upload of the xframework zip file as an
artifact.

The motivation for this change is that the xframework zip file is
currently being uploaded as part of strategy and will therefore be
attempted to be uploaded multiple times and will fail the build.

The uploading should be moved to somewhere else in the build to avoid
this.

* ci : add xcframework upload to macos-latest job
b4826
2025-03-05 08:34:02 +01:00
Clauszy
06a92a193a
server : fix cache reuse logic ()
The first kv shift offsets the positions of all tokens after head_c.
When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.
2025-03-05 09:25:45 +02:00
Daniel Bevenius
a057897ad4
llama : add xcframework build script ()
* llama : add xcframework build script

This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.

The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```

Refs: https://github.com/ggml-org/llama.cpp/issues/10747

* examples : remove llama.cpp (source dir ref) from project.pbxproj

This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.

* ci : updated build.yml to use build-xcframework.sh

* ci : add xcframework build to github releases

This commit adds the ability to create a GitHub release with the
xcframework build artifact.

* scripts : add apple app validation scripts

This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.

The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.

* llama : remove Package.swift

This commit removes the Package.swift file, as we are now building an
XCFramework for the project.

* llama : remove Sources and spm-headers directories

* llama : use TargetConditionals.h for visionOS/tvOS
b4824
2025-03-05 06:30:31 +01:00
mgroeber9110
5bbe6a9fe9
ggml : portability fixes for VS 2017 ()
* Add include files for std::min/max and std::toupper/tolower

* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined

* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode

* win32: only use __restrict in MSVC if C11/C17 support is not enabled

---------

Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>
b4823
2025-03-04 18:53:26 +02:00
Georgi Gerganov
20a9b8f5e1
readme : fix roadmap link () 2025-03-04 18:42:44 +02:00
Sigbjørn Skjæret
56d7a9f812
main: allow preloading conversation with -p and add -st / --single-turn ()
* Add chat template formatting to -no-cnv

* only enable prompt formatting if explicitly enabled

* add -st / --single-turn

* add --single-turn and -p in conversation mode

* fix -sys + -p

* reword warning

* small readability change and fix (long) outdated example usage

* only activate single turn in conversation mode
b4821
2025-03-04 12:19:39 -04:00
Olivier Chafik
1a24c4621f
server: fix deadly typo in response_format.json_schema.schema handling () b4820 2025-03-04 08:24:07 +02:00
David Huang
becade5de7
HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ ()
Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check
Adds rocWMMA support to fattn-wmma-f16

---

Signed-off-by: Carl Klemm <carl@uvos.xyz>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Ben Jackson <ben@ben.com>
b4819
2025-03-03 22:10:54 +01:00
Georgi Gerganov
dfd6b2c0be sync : ggml
ggml-ci
b4818
2025-03-03 18:18:11 +02:00
cmdr2
b64d7cc272 cuda: unary ops as float + de-duplicate (ggml/1130) 2025-03-03 18:18:11 +02:00
Georgi Gerganov
3d1cf3cf33 sync : ggml
ggml-ci
2025-03-03 18:18:11 +02:00