Commit Graph

  • 81c7e64fc2
    dsiable curl lib check, this action is missed by commit bd3f59f81289b920bcc597a208c14f55e39ed37e (#12761) (#12937) master Neo Zhang Jianyu 2025-04-14 18:19:07 +08:00
  • 455691c52f
    cont : use MTLHeapTypePlacement gg/metal-heap Georgi Gerganov 2025-04-11 13:32:36 +03:00
  • 91d5dc5a2b
    cont : heap allocation now works [no ci] Georgi Gerganov 2025-04-11 12:27:15 +03:00
  • cbb617edc6
    cont : not working .. [no ci] Georgi Gerganov 2025-04-10 18:42:36 +03:00
  • c2c0f0f7d8
    cont : fix alignment [no ci] Georgi Gerganov 2025-04-10 16:55:05 +03:00
  • c77ccf0bf5
    wip Georgi Gerganov 2025-04-10 16:27:12 +03:00
  • e1dc4df76f
    cont : fix free Georgi Gerganov 2025-04-10 15:27:39 +03:00
  • 2804db7812
    cont : heap for each cmd buffer [no ci] Georgi Gerganov 2025-04-10 14:56:47 +03:00
  • 9433c504c0
    cont : refactor heap [no ci] Georgi Gerganov 2025-04-10 14:49:49 +03:00
  • 37450314b5
    cont : resize heap [no ci] Georgi Gerganov 2025-04-09 16:55:01 +03:00
  • 2341e7c688
    cont : free buffers from the heap Georgi Gerganov 2025-04-09 16:02:59 +03:00
  • c254b21307
    metal : add memory pool for temp allocs (wip) [no ci] Georgi Gerganov 2025-04-09 14:50:41 +03:00
  • 3938c25ae9
    metal : add exp FA kernels for DeepSeek models gg/mla Georgi Gerganov 2025-04-13 20:32:01 +03:00
  • 0100feb33e
    graph : make mla compatible with FA Georgi Gerganov 2025-04-13 19:49:36 +03:00
  • 23c0090fa4
    common : Define cache directory on AIX (#12915) Prajwal B Mehendarkar 2025-04-12 21:03:39 +05:30
  • 526739b879 sync : ggml b5129 Georgi Gerganov 2025-04-14 08:52:10 +03:00
  • a25355e264 cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190) cmdr2 2025-04-11 12:14:19 +05:30
  • e959d32b1c
    ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register (#12773) b5127 SXX 2025-04-14 13:47:55 +08:00
  • 307bfa253d
    ggml: disable CUDA graphs for unsupported DUP and CONT node types (#12891) b5126 Alan Gray 2025-04-13 22:12:21 +01:00
  • 71e90e8813
    quantize: Handle user-defined quantization levels for additional tensors (#12511) b5125 Ed Addario 2025-04-13 19:29:28 +01:00
  • a5742780b2 Fixed call to build_attn in llm_build_t5_enc juk 2025-04-13 13:25:41 +01:00
  • 36ce2353c3 Removed build_attn_mla and added nullptr to all build_atnn calls juk 2025-04-13 13:15:28 +01:00
  • 925af997e8 Simplified is_mla branch in llm_build_deepseek2() juk 2025-04-13 12:41:33 +01:00
  • a5df71ec9c Removed MQA optimisation from build_attn_mha() as no gains now juk 2025-04-13 12:40:31 +01:00
  • 638b092d7a Removed the 3D views of wk_b and wv_b, and just save and 3D in GGUF juk 2025-04-12 20:26:24 +01:00
  • 5d037ae935 Removed the 3 conts again juk 2025-04-12 20:19:46 +01:00
  • 77ad5e4522 Merge branch 'mla--ready-for-review' of https://github.com/jukofyork/llama.cpp into mla--ready-for-review juk 2025-04-12 19:50:14 +01:00
  • 57788614a0 Use k_pe = ggml_reshape juk 2025-04-12 19:35:43 +01:00
  • 815f4f9ecf Used reshape in llm_graph_context::build_attn_mha() juk 2025-04-12 19:32:19 +01:00
  • e2153236ce Reverted removal of the 3 conts juk 2025-04-12 19:28:13 +01:00
  • 77fe59b402 Changed to use <cmath> instead of <math.h> juk 2025-04-12 18:54:40 +01:00
  • 2a4e1b25b0 Removed 3 conts (2x RoPE and 1x RMS-norm) juk 2025-04-12 18:52:05 +01:00
  • bc091a4dc5
    common : Define cache directory on AIX (#12915) b5124 Prajwal B Mehendarkar 2025-04-12 21:03:39 +05:30
  • c44948824f
    Merge branch 'ggml-org:master' into mla--ready-for-review Juk Armstrong 2025-04-12 12:03:36 +01:00
  • a4837577aa
    vulkan: use aligned loads for flash attention mask (#12853) b5123 Jeff Bolz 2025-04-12 03:44:48 -05:00
  • a6f3aca617
    restore local workgroup size adjustments for large inputs fix_im2col Akarshan Biswas 2025-04-12 11:18:50 +05:30
  • ae4bc15a32
    SYCL: Fix im2col Akarshan Biswas 2025-04-12 10:47:49 +05:30
  • e59ea539b8
    llava: Fix cpu-only clip image encoding sefault (#12907) b5122 Matt Clayton 2025-04-12 01:29:03 -04:00
  • 3fe362fe49 gguf-py : use ThreadPoolExecutor when writing tensors compilade/parallel-convert Francis Couture-Harpin 2025-04-12 00:00:51 -04:00
  • fd058988e1
    SYCL: Add ROPE vision kernel rope_vision Akarshan Biswas 2025-04-11 10:38:50 +05:30
  • c94085df28
    server : add VSCode's Github Copilot Chat support (#12896) b5121 Georgi Gerganov 2025-04-11 23:37:41 +03:00
  • fed6600f9c
    Merge branch 'ggml-org:master' into mla--ready-for-review Juk Armstrong 2025-04-11 21:23:35 +01:00
  • 51e6e0079c threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling maxk/sched-prio-updates Max Krasnyansky 2025-04-11 13:20:06 -07:00
  • e8a62631b3
    rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903) b5120 yuri@FreeBSD 2025-04-11 13:04:14 -07:00
  • b6930ebc42
    tool-call: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900) b5119 Olivier Chafik 2025-04-11 12:47:52 -07:00
  • 68b08f36d0
    common : Define cache directory on FreeBSD (#12892) b5118 yuri@FreeBSD 2025-04-11 12:45:44 -07:00
  • d7db1593ee Merge branch 'master' into compilade/parallel-convert Francis Couture-Harpin 2025-04-11 15:18:33 -04:00
  • 578754b315
    sycl: Support sycl_ext_oneapi_limited_graph (#12873) b5117 Ewan Crawford 2025-04-11 15:32:14 +02:00
  • b2034c2b55
    contrib: support modelscope community (#12664) b5116 tastelikefeet 2025-04-11 20:01:56 +08:00
  • 06bb53ad9b
    llama-model : add Glm4Model implementation for GLM-4-0414 (#12867) b5115 Yuxuan Zhang 2025-04-11 18:10:10 +08:00
  • 0c50923944
    clip : use smart pointer (⚠️ breaking change) (#12869) b5114 Xuan-Son Nguyen 2025-04-11 12:09:39 +02:00
  • fccf9cae83
    SYCL: Add fp16 type support to unary op kernels (#12788) b5113 Akarshan Biswas 2025-04-11 13:33:50 +05:30
  • ec6c09d0fa
    convert : Llama4 RoPE fix (#12889) Daniel Han 2025-04-11 00:49:09 -07:00
  • 8ac9f5d765
    ci : Replace freediskspace to free_disk_space in docker.yml (#12861) R0CKSTAR 2025-04-11 15:26:17 +08:00
  • 12e9158f25
    xcf : add check for visionos build version (#12854) Daniel Bevenius 2025-04-11 09:24:34 +02:00
  • 5b1f13cb64
    convert : proper tensor name mapping for llama4 (#12870) Xuan-Son Nguyen 2025-04-11 09:23:37 +02:00
  • 8b91d5355a
    llama : correct rms norm for llama 4 (#12882) b5108 Xuan-Son Nguyen 2025-04-11 08:49:50 +02:00
  • 0fed24c347
    ggml: fix compilation error s390x (#12848) b5107 Aaron Teo 2025-04-11 13:20:07 +08:00
  • 47ba87d0a4 sync : ggml b5106 Georgi Gerganov 2025-04-11 00:08:23 +03:00
  • 1d2b613445 tests : fix init order (#0) Georgi Gerganov 2025-04-11 00:04:25 +03:00
  • eb420e1148 sync : ggml Georgi Gerganov 2025-04-10 23:59:16 +03:00
  • cb79c2e7fa ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187) cmdr2 2025-04-10 17:53:08 +05:30
  • fe92821ea9 ggml : add bilinear upscale support (ggml/1185) Diego Devesa 2025-04-09 12:32:13 +02:00
  • 459895c326 ggml : add more generic custom op, remove deprecated custom ops (ggml/1183) Diego Devesa 2025-04-09 12:31:34 +02:00
  • e4bf72d631 scripts : fix sync-ggml-am.sh Georgi Gerganov 2025-04-10 23:59:01 +03:00
  • 8b9cc7cdd8
    llava : introduce libmtmd (#12849) b5099 Xuan-Son Nguyen 2025-04-10 22:57:16 +02:00
  • 625a7a7853 ggml : add x64 base ABI variant sl/no-avx-variant slaren 2025-04-10 20:31:24 +02:00
  • 64eda5deb9
    convert : ability to lazy-load safetensors remotely without downloading to disk (#12820) Xuan-Son Nguyen 2025-04-10 17:24:44 +02:00
  • 3c36f96a67 ggml : add SSE 4.2 variant for CPUs without AVX slaren 2025-04-10 11:58:55 +02:00
  • 098f0e5eea
    test gg/test-fp16 Georgi Gerganov 2025-04-10 12:35:16 +03:00
  • fe5b78c896
    CANN: Support more ops (#12841) b5097 Chenguang Li 2025-04-10 08:51:52 +08:00
  • 11d07e1e69
    Fixes #12823 (#12830) b5096 Prajwal B Mehendarkar 2025-04-10 04:48:01 +05:30
  • b0091ecc1e
    docker : added all CPU to GPU images (#12749) Rudi Servo 2025-04-09 23:17:12 +00:00
  • 31f7803bc4
    ggml-cpu-impl.h: do not redefine bool on POWER9 (#12856) b5094 Piotr Kubaj 2025-04-09 23:00:34 +00:00
  • 2391506ace
    ggml-impl.h: fix build on POWER9 (#12855) b5093 Piotr Kubaj 2025-04-09 23:00:25 +00:00
  • d3bd7193ba
    llama : Support Qwen3 and Qwen3MoE (#12828) b5092 Bo Zheng 2025-04-09 17:47:36 +08:00
  • d9a63b2f2e
    musa: enable freediskspace for docker image build (#12839) R0CKSTAR 2025-04-09 17:22:30 +08:00
  • 8ed71242f4
    sycl: update documentation to use -no-cnv (#12845) Romain Biessy 2025-04-09 11:22:04 +02:00
  • 381603a775
    ci: detach common from the library (#12827) b5089 Plamen Minev 2025-04-09 11:11:11 +03:00
  • 65a69e6e1b
    clip : do not print ftype (#12832) Xuan-Son Nguyen 2025-04-09 10:09:53 +02:00
  • 47277d6d1d
    readme : add rpc backend (#12842) Georgi Gerganov 2025-04-09 10:54:42 +03:00
  • 6e1c4cebdb
    CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786) b5086 Chenguang Li 2025-04-09 14:04:14 +08:00
  • 0090950f67
    vulkan: In coopmat2 mmq, load q4_k/q5_k scales through shared memory (#12833) b5085 Jeff Bolz 2025-04-09 00:25:08 -05:00
  • 7ecd780b1a
    vulkan: Use fp16 for the flash attention P*V multiplication (#12783) b5084 Jeff Bolz 2025-04-09 00:12:57 -05:00
  • 7612566686
    Merge branch 'ggml-org:master' into mla--ready-for-review Juk Armstrong 2025-04-09 05:05:41 +01:00
  • d8bab9efa1 gguf-py : add more clarifying comments for multi-thread writes Francis Couture-Harpin 2025-04-08 21:55:15 -04:00
  • 7538246e7c
    cuda : add f32 to bf16 copy op (#12806) b5083 Sigbjørn Skjæret 2025-04-08 23:21:31 +02:00
  • 06e1d3119a convert : write tensors in parallel Francis Couture-Harpin 2025-04-08 16:31:45 -04:00
  • b32efad2bc
    llava: improve clip_ctx destructor to not memleak load_image_size (#12834) b5082 Matt Clayton 2025-04-08 16:01:58 -04:00
  • a19b5cef16
    llama : fix FA when KV cache is not used (i.e. embeddings) (#12825) b5081 Georgi Gerganov 2025-04-08 19:54:51 +03:00
  • 78a1ba0a4f
    server : fix thread.join() on exit (#12831) b5080 Xuan-Son Nguyen 2025-04-08 18:37:06 +02:00
  • 2dabf759e7
    llava: add more helper functions to check projector types in clip context (#12824) b5079 dm4 2025-04-08 21:49:13 +08:00
  • 1d343b4069
    arg : Including limits file on AIX (#12822) b5078 Prajwal B Mehendarkar 2025-04-08 18:00:59 +05:30
  • 8ca6e1c3a4
    server : webui : Improve Chat Input with Auto-Sizing Textarea (#12785) characharm 2025-04-08 14:14:59 +05:00
  • 656babd6c2
    Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (#12812) b5076 Neo Zhang Jianyu 2025-04-08 15:03:21 +08:00
  • a226bc7a9a
    gguf-py : support lazy tensor splitting (#12809) compilade 2025-04-08 03:03:07 -04:00
  • e9e1882d2d
    rm tail space revert-12734-fix_code_in_ggmlsycl Neo Zhang Jianyu 2025-04-08 13:43:11 +08:00
  • 76f2ed3d77
    Update ggml/src/ggml-sycl/ggml-sycl.cpp Neo Zhang Jianyu 2025-04-08 13:16:14 +08:00
  • d271172ab1
    Update ggml/src/ggml-sycl/ggml-sycl.cpp Neo Zhang Jianyu 2025-04-08 10:32:18 +08:00
  • 564a05daf2
    Revert "sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_s…" Neo Zhang Jianyu 2025-04-08 10:29:41 +08:00