This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| tips:rocm [2025/12/06 09:27] โ [3\. ROCm (Optional but Recommended)] sscipioni | tips:rocm [2025/12/29 23:04] (current) โ [3\. ROCm (Optional but Recommended)] sscipioni | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | References: | ||
| + | - https:// | ||
| + | |||
| + | |||
| This report outlines the deployment of the **Ollama LLM runtime** on **Arch Linux** specifically tailored for the **AMD Ryzen AI Max+ 395 APU**. The primary focus is optimizing performance by leveraging the integrated **Radeon 8060S iGPU** through the **Vulkan** backend, and considering the potential of the **XDNA 2 NPU** for heterogeneous acceleration. | This report outlines the deployment of the **Ollama LLM runtime** on **Arch Linux** specifically tailored for the **AMD Ryzen AI Max+ 395 APU**. The primary focus is optimizing performance by leveraging the integrated **Radeon 8060S iGPU** through the **Vulkan** backend, and considering the potential of the **XDNA 2 NPU** for heterogeneous acceleration. | ||
| Line 39: | Line 43: | ||
| ```bash | ```bash | ||
| # Install essential ROCm packages | # Install essential ROCm packages | ||
| - | yay -S rocm-core amdgpu_top rocminfo | + | yay -S rocm-core amdgpu_top rocminfo |
| yay -S rocm-hip-sdk rocm-opencl-runtime | yay -S rocm-hip-sdk rocm-opencl-runtime | ||
| sudo usermod -a -G render, | sudo usermod -a -G render, | ||
| + | |||
| + | # xdna | ||
| + | yay -S amdxdna-driver-bin xrt-npu-git | ||
| + | |||
| ``` | ``` | ||
| Line 131: | Line 139: | ||
| + | |||
| + | ====== lemonade-server ====== | ||
| + | |||
| + | <code | download> | ||
| + | yay -S lemonade-server | ||
| + | </ | ||
| + | |||
| + | oga-hybrid mode: this splits the work, the NPU handles the prefill (prompt processing), | ||
| + | <code | download> | ||
| + | lemonade-server run Qwen3-Coder-30B-A3B-Instruct-GGUF --recipe oga-hybrid --llamacpp rocm | ||
| + | </ | ||
| + | |||
| + | <code | download> | ||
| + | curl http:// | ||
| + | -H " | ||
| + | -d '{ | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | }' | ||
| + | curl http:// | ||
| + | </ | ||
| ====== Benchmark ====== | ====== Benchmark ====== | ||
| Line 144: | Line 174: | ||
| # This command downloads the 4.92 GB model file directly. | # This command downloads the 4.92 GB model file directly. | ||
| wget https:// | wget https:// | ||
| + | https:// | ||
| # bench | # bench | ||
| ./ | ./ | ||
| + | ./ | ||
| + | </ | ||
| + | |||
| + | ollama | ||
| + | <code bash> | ||
| + | ollama run llama3.2 | ||
| + | </ | ||
| + | |||
| + | |||
| + | < | ||
| + | yay -S python-huggingface-hub | ||
| + | hf download Qwen/ | ||
| </ | </ | ||