mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-04-14 18:46:08 +00:00
musa: support new arch mp_31 and update doc (#12296)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
This commit is contained in:
parent
8acdacb3ea
commit
251364549f
2
Makefile
2
Makefile
@ -836,7 +836,7 @@ ifdef GGML_MUSA
|
|||||||
else
|
else
|
||||||
MUSA_PATH ?= /opt/musa
|
MUSA_PATH ?= /opt/musa
|
||||||
endif
|
endif
|
||||||
MUSA_ARCHITECTURES ?= 21;22
|
MUSA_ARCHITECTURES ?= 21;22;31
|
||||||
|
|
||||||
MK_CPPFLAGS += -DGGML_USE_MUSA -DGGML_USE_CUDA
|
MK_CPPFLAGS += -DGGML_USE_MUSA -DGGML_USE_CUDA
|
||||||
MK_LDFLAGS += -L$(MUSA_PATH)/lib -Wl,-rpath=$(MUSA_PATH)/lib
|
MK_LDFLAGS += -L$(MUSA_PATH)/lib -Wl,-rpath=$(MUSA_PATH)/lib
|
||||||
|
@ -197,29 +197,53 @@ The following compilation options are also available to tweak performance:
|
|||||||
|
|
||||||
## MUSA
|
## MUSA
|
||||||
|
|
||||||
This provides GPU acceleration using the MUSA cores of your Moore Threads MTT GPU. Make sure to have the MUSA SDK installed. You can download it from here: [MUSA SDK](https://developer.mthreads.com/sdk/download/musa).
|
This provides GPU acceleration using a Moore Threads GPU. Make sure to have the [MUSA SDK](https://developer.mthreads.com/musa/musa-sdk) installed.
|
||||||
|
|
||||||
- Using `CMake`:
|
#### Download directly from Moore Threads
|
||||||
|
|
||||||
```bash
|
You may find the official downloads here: [Moore Threads developer site](https://developer.mthreads.com/sdk/download/musa).
|
||||||
cmake -B build -DGGML_MUSA=ON
|
|
||||||
cmake --build build --config Release
|
### Compilation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cmake -B build -DGGML_MUSA=ON
|
||||||
|
cmake --build build --config Release
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Override Compute Capability Specifications
|
||||||
|
|
||||||
|
By default, all supported compute capabilities are enabled. To customize this behavior, you can specify the `MUSA_ARCHITECTURES` option in the CMake command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cmake -B build -DGGML_MUSA=ON -DMUSA_ARCHITECTURES="21"
|
||||||
|
```
|
||||||
|
|
||||||
|
This configuration enables only compute capability `2.1` (MTT S80) during compilation, which can help reduce compilation time.
|
||||||
|
|
||||||
|
#### Compilation options
|
||||||
|
|
||||||
|
Most of the compilation options available for CUDA should also be available for MUSA, though they haven't been thoroughly tested yet.
|
||||||
|
|
||||||
|
- For static builds, add `-DBUILD_SHARED_LIBS=OFF` and `-DCMAKE_POSITION_INDEPENDENT_CODE=ON`:
|
||||||
```
|
```
|
||||||
|
|
||||||
For static build:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cmake -B build -DGGML_MUSA=ON \
|
cmake -B build -DGGML_MUSA=ON \
|
||||||
-DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
|
-DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
|
||||||
cmake --build build --config Release
|
cmake --build build --config Release
|
||||||
```
|
```
|
||||||
|
|
||||||
The environment variable [`MUSA_VISIBLE_DEVICES`](https://docs.mthreads.com/musa-sdk/musa-sdk-doc-online/programming_guide/Z%E9%99%84%E5%BD%95/) can be used to specify which GPU(s) will be used.
|
### Runtime MUSA environmental variables
|
||||||
|
|
||||||
|
You may set the [musa environmental variables](https://docs.mthreads.com/musa-sdk/musa-sdk-doc-online/programming_guide/Z%E9%99%84%E5%BD%95/) at runtime.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Use `MUSA_VISIBLE_DEVICES` to hide the first compute device.
|
||||||
|
MUSA_VISIBLE_DEVICES="-0" ./build/bin/llama-server --model /srv/models/llama.gguf
|
||||||
|
```
|
||||||
|
|
||||||
|
### Unified Memory
|
||||||
|
|
||||||
The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted.
|
The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted.
|
||||||
|
|
||||||
Most of the compilation options available for CUDA should also be available for MUSA, though they haven't been thoroughly tested yet.
|
|
||||||
|
|
||||||
## HIP
|
## HIP
|
||||||
|
|
||||||
This provides GPU acceleration on HIP-supported AMD GPUs.
|
This provides GPU acceleration on HIP-supported AMD GPUs.
|
||||||
|
@ -21,7 +21,7 @@ if (MUSAToolkit_FOUND)
|
|||||||
message(STATUS "MUSA Toolkit found")
|
message(STATUS "MUSA Toolkit found")
|
||||||
|
|
||||||
if (NOT DEFINED MUSA_ARCHITECTURES)
|
if (NOT DEFINED MUSA_ARCHITECTURES)
|
||||||
set(MUSA_ARCHITECTURES "21;22")
|
set(MUSA_ARCHITECTURES "21;22;31")
|
||||||
endif()
|
endif()
|
||||||
message(STATUS "Using MUSA architectures: ${MUSA_ARCHITECTURES}")
|
message(STATUS "Using MUSA architectures: ${MUSA_ARCHITECTURES}")
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user