llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-04-20 13:36:08 +00:00

Author	SHA1	Message	Date
Daniel Bevenius	2679c3b55d	ci : set GITHUB_ACTION env var for server tests (#12162 ) This commit tries to address/improve an issue with the server tests which are failing with a timeout. Looking at the logs it seems like they are timing out after 12 seconds: ``` FAILED unit/test_chat_completion.py::test_completion_with_json_schema[False-json_schema0-6-"42"] - TimeoutError: Server did not start within 12 seconds ``` This is somewhat strange as in utils.py we have the following values: ```python DEFAULT_HTTP_TIMEOUT = 12 if "LLAMA_SANITIZE" in os.environ or "GITHUB_ACTION" in os.environ: DEFAULT_HTTP_TIMEOUT = 30 def start(self, timeout_seconds: int \| None = DEFAULT_HTTP_TIMEOUT) -> None: ``` It should be the case that a test running in a github action should have a timeout of 30 seconds. However, it seems like this is not the case. Inspecting the logs from the CI job we can see the following environment variables: ```console Run cd examples/server/tests 2 cd examples/server/tests 3 ./tests.sh 4 shell: /usr/bin/bash -e {0} 5 env: 6 LLAMA_LOG_COLORS: 1 7 LLAMA_LOG_PREFIX: 1 8 LLAMA_LOG_TIMESTAMPS: 1 9 LLAMA_LOG_VERBOSITY: 10 10 pythonLocation: /opt/hostedtoolcache/Python/3.11.11/x64 ``` This probably does not address the underlying issue that the servers that are providing the models to be downloaded occasionally take a longer time to response but might improve these situations in some cases.	2025-03-03 16:17:36 +01:00
Georgi Gerganov	f3e64859ed	ci : fix arm upload artifacts (#12024 ) * ci : fix arm upload artifacts * cont : fix archive name to use matrix	2025-02-22 15:03:00 +02:00
Rohanjames1997	335eb04a91	ci : Build on Github-hosted arm64 runners (#12009 )	2025-02-22 11:48:57 +01:00
Eve	f7b1116af1	update release requirements (#11897 )	2025-02-17 12:20:23 +01:00
Xuan-Son Nguyen	818a340ea8	ci : fix (again) arm64 build fails (#11895 ) * docker : attempt fixing arm64 build on ci * qemu v7.0.0-28	2025-02-16 10:36:39 +01:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
Rémy O	fc1b0d0936	vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528 ) * vulkan: initial support for IQ1_S and IQ1_M quantizations * vulkan: define MMV kernels for IQ1 quantizations * devops: increase timeout of Vulkan tests again * vulkan: simplify ifdef for init_iq_shmem	2025-02-15 09:01:40 +01:00
Eve	a4f011e8d0	vulkan: linux builds + small subgroup size fixes (#11767 ) * mm subgroup size * upload vulkan x86 builds	2025-02-14 02:59:40 +00:00
R0CKSTAR	bd6e55bfd3	musa: bump MUSA SDK version to rc3.1.1 (#11822 ) * musa: Update MUSA SDK version to rc3.1.1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: Remove workaround in PR #10042 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-13 13:28:18 +01:00
Xuan-Son Nguyen	2fb3c32a16	server : (webui) migrate project to ReactJS with typescript (#11688 ) * init version * fix auto scroll * bring back copy btn * bring back thought process * add lint and format check on CI * remove lang from html tag * allow multiple generations at the same time * lint and format combined * fix unused var * improve MarkdownDisplay * fix more latex * fix code block cannot be selected while generating	2025-02-06 17:32:29 +01:00
Jeff Bolz	2c6c8df56d	vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521 ) * vulkan: optimize coopmat2 iq2/iq3 callbacks * build: trigger CI on GLSL compute shader changes	2025-02-06 07:15:30 +01:00
Georgi Gerganov	b34aedd558	ci : do not stale-close roadmap issues	2025-02-04 09:31:01 +02:00
Michał Moskal	ff227703d6	sampling : support for llguidance grammars (#10224 ) * initial porting of previous LLG patch * update for new APIs * build: integrate llguidance as an external project * use '%llguidance' as marker to enable llg lark syntax * add some docs * clarify docs * code style fixes * remove llguidance.h from .gitignore * fix tests when llg is enabled * pass vocab not model to llama_sampler_init_llg() * copy test-grammar-integration.cpp to test-llguidance.cpp * clang fmt * fix ref-count bug * build and run test * gbnf -> lark syntax * conditionally include llguidance test based on LLAMA_LLGUIDANCE flag * rename llguidance test file to test-grammar-llguidance.cpp * add gh action for llg test * align tests with LLG grammar syntax and JSON Schema spec * llama_tokenizer() in fact requires valid utf8 * update llg * format file * add $LLGUIDANCE_LOG_LEVEL support * fix whitespace * fix warning * include <cmath> for INFINITY * add final newline * fail llama_sampler_init_llg() at runtime * Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes * simplify #includes * improve doc string for LLAMA_LLGUIDANCE * typo in merge * bump llguidance to 0.6.12	2025-02-02 09:55:32 +02:00
Olivier Chafik	53debe6f3c	ci: use sccache on windows HIP jobs (#11553 )	2025-02-01 18:22:38 +00:00
Olivier Chafik	5bbc7362cb	ci: simplify cmake build commands (#11548 )	2025-02-01 00:01:20 +00:00
Olivier Chafik	aa6fb13213	`ci`: use sccache on windows instead of ccache (#11545 ) * Use sccache on ci for windows * Detect sccache in cmake	2025-01-31 17:12:40 +00:00
Olivier Chafik	553f1e46e9	`ci`: ccache for all github worfklows (#11516 )	2025-01-30 22:01:06 +00:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
Rémy Oudompheng	66ee4f297c	vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 ) * vulkan: initial support for IQ3_S * vulkan: initial support for IQ3_XXS * vulkan: initial support for IQ2_XXS * vulkan: initial support for IQ2_XS * vulkan: optimize Q3_K by removing branches * vulkan: implement dequantize variants for coopmat2 * vulkan: initial support for IQ2_S * vulkan: vertically realign code * port failing dequant callbacks from mul_mm * Fix array length mismatches * vulkan: avoid using workgroup size before it is referenced * tests: increase timeout for Vulkan llvmpipe backend --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-01-29 18:29:39 +01:00
Xuan-Son Nguyen	d0c08040b6	ci : fix build CPU arm64 (#11472 ) * ci : fix build CPU arm64 * failed, trying ubuntu 22 * vulkan: ubuntu 24 * vulkan : jammy --> noble	2025-01-29 00:02:56 +01:00
Xuan Son Nguyen	caf773f249	docker : fix ARM build and Vulkan build (#11434 ) * ci : do not fail-fast for docker * build arm64/amd64 separatedly * fix pip * no fast fail * vulkan: try jammy	2025-01-26 22:45:32 +01:00
bandoti	19f65187cb	cmake: add ggml find package (#11369 ) * Add initial ggml cmake package * Add build numbers to ggml find-package * Expand variables with GGML_ prefix * Guard against adding to cache variable twice * Add git to msys2 workflow * Handle ggml-cpu-* variants * Link ggml/ggml-base libraries to their targets * Replace main-cmake-pkg with simple-cmake-pkg * Interface features require c_std_90 * Fix typo * Removed unnecessary bracket from status message * Update examples/simple-cmake-pkg/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/simple-cmake-pkg/README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-26 12:07:48 -04:00
Georgi Gerganov	00c24acb2a	ci : fix line breaks on windows builds (#11409 ) * ci : fix line breaks on windows builds * cont : another try * ci : fix powershell line breaks	2025-01-25 13:36:48 +02:00
jiahao su	466ea66f33	CANN: Add Ascend CANN build ci (#10217 ) * CANN: Add Ascend CANN build ci * Update build.yml * Modify cann image version * Update build.yml * Change to run on x86 system * Update build.yml * Update build.yml * Modify format error * Update build.yml * Add 'Ascend NPU' label restrictions * Exclude non PR event Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org> * Update build.yml --------- Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>	2025-01-25 00:26:01 +01:00
Georgi Gerganov	9755129c27	release : pack /lib in the packages (#11392 ) * release : pack /lib and /include in the packages * cmake : put libs in /bin * TMP : push artifacts * Revert "TMP : push artifacts" This reverts commit 4decf2c4dfc5cdf5d96ea44c03c8f9801ab41262. * ci : fix HIP cmake compiler options to be on first line * ci : restore the original HIP commands * ci : change ubuntu build from latest to 20.04 * ci : try to fix macos build rpaths * ci : remove obsolete MacOS build * TMP : push artifacts * ci : change back to ubuntu latest * ci : macos set build rpath to "@loader_path" * ci : fix typo * ci : change ubuntu package to 22.04 * Revert "TMP : push artifacts" This reverts commit 537b09e70ffc604c414ee78acf3acb4c940ec597.	2025-01-24 18:41:30 +02:00
Georgi Gerganov	92bc493917	tests : increase timeout when sanitizers are enabled (#11300 ) * tests : increase timeout when sanitizers are enabled * tests : add DEFAULT_HTTP_TIMEOUT	2025-01-19 20:22:30 +02:00
Eric Curtin	a1649cc13f	Adding linenoise.cpp to llama-run (#11252 ) This is a fork of linenoise that is C++17 compatible. I intend on adding it to llama-run so we can do things like traverse prompt history via the up and down arrows: https://github.com/ericcurtin/linenoise.cpp Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-18 14:42:31 +00:00
Georgi Gerganov	4dd34ff831	cmake : add sanitizer flags for llama.cpp (#11279 ) * cmake : add sanitizer flags for llama.cpp ggml-ci * tests : fix compile warnings ggml-ci * cmake : move sanitizer flags to llama_add_compile_flags ggml-ci * cmake : move llama.cpp compile flags to top level lists ggml-ci * cmake : apply only sanitizer flags at top level ggml-ci * tests : fix gguf context use in same_tensor_data * gguf-test: tensor data comparison * dummy : trigger ggml-ci * unicode : silence gcc warnings ggml-ci * ci : use sanitizer builds only in Debug mode ggml-ci * cmake : add status messages [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-01-18 16:18:15 +02:00
Xuan Son Nguyen	f7cd13301c	ci : use actions from ggml-org (#11140 )	2025-01-08 16:09:20 +01:00
Xuan Son Nguyen	80ccf5d725	ci : pin dependency to specific version (#11137 ) * ci : pin dependency to specific version * will this fix ec?	2025-01-08 12:07:20 +01:00
Georgi Gerganov	0d52a69e4b	ci : fix cmake option (#11125 )	2025-01-08 11:29:34 +02:00
Xuan Son Nguyen	96be8c3264	github : add cmd line field to bug report (#11090 ) * github : cmd line to bug report * codeowners : (@ngxson) only watch dockerfile * Apply suggestions from code review [no ci] Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * rm cmd in log output [no ci] * rm 2 [no ci] * no need backticks [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-01-06 16:34:49 +01:00
Georgi Gerganov	f66f582927	llama : refactor `src/llama.cpp` (#10902 ) * llama : scatter llama.cpp into multiple modules (wip) * llama : control-vector -> adapter * llama : arch * llama : mmap ggml-ci * ci : remove BUILD_SHARED_LIBS=OFF ggml-ci * llama : arch (cont) ggml-ci * llama : chat ggml-ci * llama : model ggml-ci * llama : hparams ggml-ci * llama : adapter ggml-ci * examples : fix ggml-ci * rebase ggml-ci * minor * llama : kv cache ggml-ci * llama : impl ggml-ci * llama : batch ggml-ci * cont ggml-ci * llama : context ggml-ci * minor * llama : context (cont) ggml-ci * llama : model loader ggml-ci * common : update lora ggml-ci * llama : quant ggml-ci * llama : quant (cont) ggml-ci * minor [no ci]	2025-01-03 10:18:53 +02:00
Rudi Servo	7c0e285858	devops : add docker-multi-stage builds (#10832 )	2024-12-22 23:22:58 +01:00
Eve	7b1ec53f56	vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809 ) * ensure mul mat shaders work on systems with subgroup size less than 32 more fixes add test * only s_warptile_mmq needs to be run with 32 threads or more	2024-12-17 06:52:55 +01:00
lhez	a76c56fa1a	Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693 ) * [cl][adreno] Add Adreno GPU support Add new OpenCL backend to support Adreno GPUs --------- Co-authored-by: Skyler Szot <quic_sszot@quicinc.com> Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com> Co-authored-by: Alexander Angus <quic_aangus@quicinc.com> Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com> Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com> * [cl][ci] Add workflow for CL * [cl][adreno] Fix memory leak for non SMALL_ALLOC path * opencl: integrate backend dyn.load interface and fix compiler and format warnings * opencl: remove small-alloc support and fix build errors for non-opencl platforms * opencl: fixed merge conflict (MUSA added twice in cmake) * opencl-ci: use RUNNER_TEMP instead of github.workspace * opencl: fix embed tool invocation with python3 * opencl: CI workflow fixes * opencl: Clean up small-alloc in CMake files * opencl: cleanup ggml-opencl2 header file * opencl: use ulong for offsets and strides in ADD kernel * opencl: use cl_ulong for all offsets * opencl: use cl_ulong for sizes and strides * opencl: use `GGML_LOG_xxx` instead of `fprintf(stderr, ...)` * opencl: rename backend `opencl2` -> `opencl` * opencl: rename kernel files `ggml-opencl2` -> `ggml-opencl` * opencl: make OpenCL required, remove redundant lib and inc directories * `ggml-base`, `..` and `.` are added by `ggml_add_backend_library` * opencl: rename backend - funcs, structs, etc `opencl2` -> `opencl` * opencl: remove copyright marker since main license already covers * opencl: replace some more OPENCL2 leftovers * opencl: remove limits on `tensor_extra` * opencl: use pools for `tensor_extra` * opencl: fix compiler warnings with GCC and Clang Still getting the warning about clCreateCmdQueue being obsolete. Will fix that separately. * opencl: fail gracefully if opencl devices are not available Also for unsupported GPUs. * opencl: fix MSVC builds (string length error) * opencl: check for various requirements, allow deprecated API * opencl: update log message for unsupported GPUs --------- Co-authored-by: Skyler Szot <quic_sszot@quicinc.com> Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com> Co-authored-by: Alexander Angus <quic_aangus@quicinc.com> Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com> Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>	2024-12-13 12:23:52 -08:00
Xuan Son Nguyen	92f77a640f	ci : pin nodejs to 22.11.0 (#10779 )	2024-12-11 14:59:41 +01:00
Diego Devesa	43ed389a3f	llama : use cmake for swift build (#10525 ) * llama : use cmake for swift build * swift : <> -> "" * ci : remove make * ci : disable ios build * Revert "swift : <> -> """ This reverts commit d39ffd9556482b77d4ea5b118b453fc1c097a31d. * ci : try fix ios build * ci : cont * ci : cont --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-08 13:14:54 +02:00
Xuan Son Nguyen	91c36c269b	server : (web ui) Various improvements, now use vite as bundler (#10599 ) * hide buttons in dropdown menu * use npm as deps manager and vite as bundler * fix build * fix build (2) * fix responsive on mobile * fix more problems on mobile * sync build * (test) add CI step for verifying build * fix ci * force rebuild .hpp files * cmake: clean up generated files pre build	2024-12-03 19:38:44 +01:00
Georgi Gerganov	515d4e5372	github : minify link [no ci] (revert) this doesn't work as expected	2024-12-03 11:21:43 +02:00
Georgi Gerganov	844e2e1fee	github : minify link [no ci]	2024-12-03 11:20:35 +02:00
Georgi Gerganov	8648c52101	make : deprecate (#10514 ) * make : deprecate ggml-ci * ci : disable Makefile builds ggml-ci * docs : remove make references [no ci] * ci : disable swift build ggml-ci * docs : remove obsolete make references, scripts, examples ggml-ci * basic fix for compare-commits.sh * update build.md * more build.md updates * more build.md updates * more build.md updates * Update Makefile Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-12-02 21:22:53 +02:00
Georgi Gerganov	4cb003dd8d	contrib : refresh (#10593 ) * contrib : refresh * contrib : expand [no ci] * contrib : expand test-backend-ops instructions * contrib : add CODEOWNERS * prs : update template to not have checkbox [no ci]	2024-12-02 08:53:27 +02:00
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00
Diego Devesa	e90688edd0	ci : fix tag name in cuda and hip releases (#10566 )	2024-11-28 15:58:54 +01:00
Diego Devesa	46c69e0e75	ci : faster CUDA toolkit installation method and use ccache (#10537 ) * ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master	2024-11-27 11:03:25 +01:00
Diego Devesa	c9b00a70b0	ci : fix cuda releases (#10532 )	2024-11-26 22:12:10 +01:00
Diego Devesa	5a349f2809	ci : remove nix workflows (#10526 )	2024-11-26 21:13:54 +01:00
Xuan Son Nguyen	45abe0f74e	server : replace behave with pytest (#10416 ) * server : replace behave with pytest * fix test on windows * misc * add more tests * more tests * styling * log less, fix embd test * added all sequential tests * fix coding style * fix save slot test * add parallel completion test * fix parallel test * remove feature files * update test docs * no cache_prompt for some tests * add test_cache_vs_nocache_prompt	2024-11-26 16:20:18 +01:00
Neo Zhang Jianyu	0bbd2262a3	restore the condistion to build & update pacakge when merge (#10507 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-11-26 21:43:47 +08:00

1 2 3 4 5 ...

281 Commits