Xuan Son Nguyen 0da5d86026
server : allow using LoRA adapters per-request (#10994)
* slot.can_batch_with

* lora per request

* test: force disable cache prompt

* move can_batch_with check

* fix condition

* add slow test with llama 8b

* update docs

* move lora change task to queue

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* lora_base

* remove redundant check

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-02 15:05:18 +01:00
..
2024-12-02 21:22:53 +02:00
2024-12-19 18:47:15 +02:00
2024-12-11 16:16:32 +01:00
2024-12-18 19:27:21 +02:00
2023-03-29 20:21:09 +03:00