llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-04-25 00:56:04 +00:00

Author	SHA1	Message	Date
Akarshan Biswas	510676475f	SYCL: Add ROPE vision kernel (#12887 ) * SYCL: Add ROPE vision kernel * Add comment about rope mode b5138	2025-04-15 10:37:42 +02:00
Juk Armstrong	daa422881a	llama : DeepSeek V2/V3 MLA implementation (#12801 ) * Merged using squash to remove all noise commit messages * Force flash attention off for `LLM_ARCH_DEEPSEEK2` - embedding too large * Removed 3 conts (2x RoPE and 1x RMS-norm) * Changed to use `<cmath>` instead of `<math.h>` * Reverted removal of the 3 conts * Used `reshape` in `llm_graph_context::build_attn_mha()` * Use `k_pe = ggml_reshape` * Removed the 3 conts again * Removed the 3D views of `wk_b` and `wv_b`, and just save and 3D in GGUF * Removed MQA optimisation from `build_attn_mha()` as no gains now * Simplified `is_mla` branch in `llm_build_deepseek2()` * Removed `build_attn_mla` and added `nullptr` to all `build_atnn` calls * Fixed call to `build_attn` in `llm_build_t5_enc` b5137	2025-04-15 09:49:57 +03:00
Srihari-mcw	eccc7a1602	ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (#12829 ) * Add AVX512 implementation of GEMM - q4kx8 * Update changes to remove unnecessary whitespaces b5136	2025-04-15 09:22:36 +03:00
Chenguang Li	0019279bb5	CANN: Opt ROPE optimization (#12865 ) * [CANN]Opt ROPE optimization * [CANN]Codestyle adjustment * [CANN]Fix the ROPE precision issue * [CANN]codestyle fix * [CANN]add rope unsupport case Signed-off-by: noemotiovon <noemotiovon@gmail.com> b5135	2025-04-15 10:09:35 +08:00
Xinpeng Dou	b0c75ac9f9	CANN: Optimize CANN buffer pool memory management (#12875 ) Multiple optional memory pools are provided for CANN, including VMM, priority queue-based, and traditional memory pools. 1.When the memory pool is available and GGML_CANN_DISABLE_VMM_POOL is not defined, the VMM pool is selected by default. 2.Otherwise, if GGML_CANN_ENABLE_BUF_PRIO_POOL is defined, the priority queue-based memory pool is used. 3.If neither condition is met, the default memory pool is used. b5134	2025-04-15 10:04:24 +08:00
Russyyds	d6d2c2ab8c	Add performance print for gemma3 in example (#12929 ) b5133	2025-04-14 19:18:20 +02:00
Akarshan Biswas	75afa0ae31	SYCL: Fix im2col (#12910 ) * SYCL: Fix im2col * restore local workgroup size adjustments for large inputs * restore format b5132	2025-04-14 14:23:53 +02:00
Radoslav Gerganov	c772d54926	rpc : use ggml_context_ptr (#12938 ) b5131	2025-04-14 13:59:34 +03:00
Neo Zhang Jianyu	81c7e64fc2	dsiable curl lib check, this action is missed by commit bd3f59f81289b920bcc597a208c14f55e39ed37e (#12761 ) (#12937 )	2025-04-14 18:19:07 +08:00
Georgi Gerganov	526739b879	sync : ggml ggml-ci b5129	2025-04-14 09:26:15 +03:00
cmdr2	a25355e264	cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190)	2025-04-14 09:26:15 +03:00
SXX	e959d32b1c	ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register (#12773 ) * ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register * simplifies the codebase by removing redundant functions b5127	2025-04-14 08:47:55 +03:00
Alan Gray	307bfa253d	ggml: disable CUDA graphs for unsupported DUP and CONT node types (#12891 ) Fixes #12798 b5126	2025-04-13 23:12:21 +02:00
Ed Addario	71e90e8813	quantize: Handle user-defined quantization levels for additional tensors (#12511 ) * Add llama_model_quantize_params parameters * Add new quantize parameters parsing and validation * Update usage * Add new parameters defaults * Add new quantization parameters logic * Add llama_model_quantize_params parameters * Add new quantize parameters parsing and validation * Update usage * Add new parameters defaults * Add new quantization parameters logic * Minor refactoring as per the contributors' coding guidelines * Update descriptions to match existing style * Add llama_model_quantize_params parameters * Add new quantize parameters parsing and validation * Update usage * Add new parameters defaults * Add new quantization parameters logic * Minor refactoring as per the contributors' guidelines * Implement general --tensor-type instead of tensor-specific command option * Fix implied type bug * Restore missing #includes * Add regex capability for tensor selection * Refactor function name and update ALLOWED_TENSOR_TYPE * Add missing #include * Handle edge case when tensor name is cls.output * Minor logging improvement b5125	2025-04-13 21:29:28 +03:00
Prajwal B Mehendarkar	bc091a4dc5	common : Define cache directory on AIX (#12915 ) b5124	2025-04-12 17:33:39 +02:00
Jeff Bolz	a4837577aa	vulkan: use aligned loads for flash attention mask (#12853 ) Rewrite the stride logic for the mask tensor in the FA shader to force the stride to be aligned, to allow using more efficient loads. b5123	2025-04-12 10:44:48 +02:00
Matt Clayton	e59ea539b8	llava: Fix cpu-only clip image encoding sefault (#12907 ) * llava: Fix cpu-only clip image encoding * clip : no smart ptr for ggml_backend_t * Fix for backend_ptr push_back --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> b5122	2025-04-12 07:29:03 +02:00
Georgi Gerganov	c94085df28	server : add VSCode's Github Copilot Chat support (#12896 ) * server : add VSCode's Github Copilot Chat support * cont : update handler name b5121	2025-04-11 23:37:41 +03:00
yuri@FreeBSD	e8a62631b3	rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903 ) b5120	2025-04-11 22:04:14 +02:00
Olivier Chafik	b6930ebc42	`tool-call`: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900 ) * `tool-call`: don't call common_chat_params_init_hermes_2_pro when there aren't tools (or when there's a schema) * test all chat formats w/o tools b5119	2025-04-11 21:47:52 +02:00
yuri@FreeBSD	68b08f36d0	common : Define cache directory on FreeBSD (#12892 ) b5118	2025-04-11 21:45:44 +02:00
Ewan Crawford	578754b315	sycl: Support sycl_ext_oneapi_limited_graph (#12873 ) The current usage of the SYCL-Graph extension checks for the `sycl_ext_oneapi_graph` device aspect. However, it is also possible to support `sycl_ext_oneapi_limied_graph` devices that don't support update b5117	2025-04-11 15:32:14 +02:00
tastelikefeet	b2034c2b55	contrib: support modelscope community (#12664 ) * support download from modelscope * support login * remove comments * add arguments * fix code * fix win32 * test passed * fix readme * revert readme * change to MODEL_ENDPOINT * revert tail line * fix readme * refactor model endpoint * remove blank line * fix header * fix as comments * update comment * update readme --------- Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com> b5116	2025-04-11 14:01:56 +02:00
Yuxuan Zhang	06bb53ad9b	llama-model : add Glm4Model implementation for GLM-4-0414 (#12867 ) * GLM-4-0414 * use original one * Using with tensor map * fix bug * change order * change order * format with flask8 b5115	2025-04-11 12:10:10 +02:00
Xuan-Son Nguyen	0c50923944	clip : use smart pointer (⚠️ breaking change) (#12869 ) * clip : use smart pointers * fix warmup * add forward declaration * misisng include * fix include (2) * composite * simplify batch ptr * fix conflict b5114	2025-04-11 12:09:39 +02:00
Akarshan Biswas	fccf9cae83	SYCL: Add fp16 type support to unary op kernels (#12788 ) * SYCL: Add fp16 support to some elementwise OP kernels * remove comment ggml-ci * Use static_cast directly * remove not needed cast from tanh * Use static cast and remove unneeded castings * Adjust device_support_op for unary OPs * Use cast_data and typed_data struct to deduplicate casting code b5113	2025-04-11 16:03:50 +08:00
Daniel Han	ec6c09d0fa	convert : Llama4 RoPE fix (#12889 )	2025-04-11 09:49:09 +02:00
R0CKSTAR	8ac9f5d765	ci : Replace freediskspace to free_disk_space in docker.yml (#12861 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-11 09:26:17 +02:00
Daniel Bevenius	12e9158f25	xcf : add check for visionos build version (#12854 ) This commit adds a check for the visionos build version used with vtool in build-xcframework.sh. The script now checks the Xcode version and determines whether to use "xros" or "visionos" for the build version. This commit also uses xcrun for the vtool so that the version of vtool in xcode command line tools is used instead of the one in the system path. Refs: https://github.com/ggml-org/whisper.cpp/pull/2994#issuecomment-2773292223	2025-04-11 09:24:34 +02:00
Xuan-Son Nguyen	5b1f13cb64	convert : proper tensor name mapping for llama4 (#12870 ) * Llama-4 mapping * remove hacky renaming --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-04-11 09:23:37 +02:00
Xuan-Son Nguyen	8b91d5355a	llama : correct rms norm for llama 4 (#12882 ) b5108	2025-04-11 08:49:50 +02:00
Aaron Teo	0fed24c347	ggml: fix compilation error s390x (#12848 ) * ggml: fixes #12846 compilation error Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com> * ggml: add documentation for code change Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com> * ggml: refactor to type-cast and update documentation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com> * ggml: update documentation to provide full issue link Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com> --------- Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@ibm.com> b5107	2025-04-11 08:20:07 +03:00
Georgi Gerganov	47ba87d0a4	sync : ggml b5106	2025-04-11 00:17:47 +03:00
Georgi Gerganov	1d2b613445	tests : fix init order (#0 ) ggml-ci	2025-04-11 00:17:47 +03:00
Georgi Gerganov	eb420e1148	sync : ggml ggml-ci	2025-04-11 00:17:47 +03:00
cmdr2	cb79c2e7fa	ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187) fix #1186	2025-04-11 00:17:47 +03:00
Diego Devesa	fe92821ea9	ggml : add bilinear upscale support (ggml/1185)	2025-04-11 00:17:47 +03:00
Diego Devesa	459895c326	ggml : add more generic custom op, remove deprecated custom ops (ggml/1183) * ggml : add more generic ggml_custom op * ggml : remove deprecated custom ops	2025-04-11 00:17:47 +03:00
Georgi Gerganov	e4bf72d631	scripts : fix sync-ggml-am.sh	2025-04-11 00:17:47 +03:00
Xuan-Son Nguyen	8b9cc7cdd8	llava : introduce libmtmd (#12849 ) * wip llava2 * migrated gemma3 to llava2 * add timings * correct pre/postfix * fix missing include * fix compilation unused var warn * update llava2_tokenize * change name llava2 --> mtmd * improve api * refine helpers * Update examples/llava/mtmd.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b5099	2025-04-10 22:57:16 +02:00
Xuan-Son Nguyen	64eda5deb9	convert : ability to lazy-load safetensors remotely without downloading to disk (#12820 ) * gguf util : add SafetensorRemote * fix style * convert: add --remote option * convert : allow using lazy remote tensors It's a bit slow for now since everything is blocking and single-threaded. * correct metadata.name * small style fix * support HF_TOKEN * convert : use writeable buffer for remote lazy tensors * convert : fix flake8 lint regarding lamdba assigment * multithreaded download * multithread: print debug * fix style * Revert "multithreaded download" This reverts commit 42fc895ace385edc972ad819c76c704aeea61791. * bring back _get_request_headers --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2025-04-10 17:24:44 +02:00
Chenguang Li	fe5b78c896	CANN: Support more ops (#12841 ) * [CANN]Support Opt LOG && MEAN && PAD_REFLECT_1D * [CANN]Support COUNT_EQUAL && STEP && SGN * [CANN]codestyle adjustment * [CANN]codestyle adjustment --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com> b5097	2025-04-10 08:51:52 +08:00
Prajwal B Mehendarkar	11d07e1e69	Fixes #12823 (#12830 ) * Including limits file on AIX * Fixes #12823 b5096	2025-04-10 01:18:01 +02:00
Rudi Servo	b0091ecc1e	docker : added all CPU to GPU images (#12749 )	2025-04-10 01:17:12 +02:00
Piotr Kubaj	31f7803bc4	ggml-cpu-impl.h: do not redefine bool on POWER9 (#12856 ) error: unknown type name '_Bool' b5094	2025-04-10 01:00:34 +02:00
Piotr Kubaj	2391506ace	ggml-impl.h: fix build on POWER9 (#12855 ) error: ISO C++17 does not allow 'register' storage class specifier b5093	2025-04-10 01:00:25 +02:00
Bo Zheng	d3bd7193ba	llama : Support Qwen3 and Qwen3MoE (#12828 ) * add qwen3 & qwen3moe support. * fix --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> b5092	2025-04-09 11:47:36 +02:00
R0CKSTAR	d9a63b2f2e	musa: enable freediskspace for docker image build (#12839 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-09 11:22:30 +02:00
Romain Biessy	8ed71242f4	sycl: update documentation to use -no-cnv (#12845 )	2025-04-09 11:22:04 +02:00
Plamen Minev	381603a775	ci: detach common from the library (#12827 ) * fix: detach common from the library * fix: building chat test template b5089	2025-04-09 10:11:11 +02:00

1 2 3 4 5 ...

5138 Commits