Differences

This shows you the differences between two versions of the page.

--- tips:rocm [2025/12/06 09:27] – [3\. ROCm (Optional but Recommended)] sscipioni
+++ tips:rocm [2026/02/15 08:33] (current) – [1\. Kernel and Firmware] sscipioni
@@ Line 1: / Line 1: @@
+References:
+- https://community.frame.work/t/amd-strix-halo-llama-cpp-installation-guide-for-fedora-42/75856
+- https://strix-halo-toolboxes.com/
+- https://github.com/kyuz0/amd-strix-halo-toolboxes
 This report outlines the deployment of the **Ollama LLM runtime** on **Arch Linux** specifically tailored for the **AMD Ryzen AI Max+ 395 APU**. The primary focus is optimizing performance by leveraging the integrated **Radeon 8060S iGPU** through the **Vulkan** backend, and considering the potential of the **XDNA 2 NPU** for heterogeneous acceleration.
@@ Line 15: / Line 20: @@
 Before deploying Ollama, the base Arch Linux installation must have the correct drivers and utilities to fully expose the APU's capabilities, especially for Vulkan and unified memory management.
 ### 1\. Kernel and Firmware
+Set in bios UMA buffer size to 8G or lower.
 Ensure the system is running a recent kernel (e.g., $6.10+$ or later) for optimal Zen 5 and RDNA 3.5 support.
@@ Line 25: / Line 33: @@
 sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
 ```
+add to kernel parameters
+```
+iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856
+```
 ### 2\. Graphics and Compute Drivers (Vulkan)
@@ Line 33: / Line 47: @@
 sudo pacman -S mesa vulkan-radeon lib32-vulkan-radeon vulkan-headers
 ```
+`
 ### 3\. ROCm (Optional but Recommended)
@@ Line 39: / Line 54: @@
 ```bash
 # Install essential ROCm packages
-yay -S rocm-core amdgpu_top rocminfo
+yay -S rocm-core amdgpu_top rocminfo rocm-device-libs
 yay -S rocm-hip-sdk rocm-opencl-runtime
 sudo usermod -a -G render,video $USER
+# xdna
+yay -S amdxdna-driver-bin xrt-npu-git
 ```
@@ Line 132: / Line 151: @@
+====== lemonade-server ======
+<code | download>
+yay -S lemonade-server
+</code>
+oga-hybrid mode: this splits the work, the NPU handles the prefill (prompt processing), and the iGPU handles the generation.
+<code | download>
+lemonade-server run Qwen3-Coder-30B-A3B-Instruct-GGUF --recipe oga-hybrid --llamacpp rocm
+</code>
+<code | download>
+curl http://localhost:8000/api/v1/chat/completions \
+                    -H "Content-Type: application/json" \
+                    -d '{
+                  "model": "Qwen3-Coder-30B-A3B-Instruct-GGUF",
+                  "messages": [{"role": "user", "content": "Who are you?"}],
+                  "stream": false
+                }'
+curl http://localhost:8000/api/v1/stats
+</code>
+====== Comfyui ======
+pypi packages
+download from https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/ and install triton, torch, torchvision and torchaudio
+<code | download>
+cd wheels
+wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torch-2.10.0.dev20260111%2Brocm7.2.0.lw.gitdea53f5b-cp312-cp312-linux_x86_64.whl
+wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torchaudio-2.10.0%2Brocm7.2.0.git3b0e7a6f-cp312-cp312-linux_x86_64.whl
+wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torchvision-0.25.0%2Brocm7.2.0.gitaa35ca19-cp312-cp312-linux_x86_64.whl
+wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/triton-3.5.1%2Brocm7.2.0.gita272dfa8-cp312-cp312-linux_x86_64.whl
+# install
+pip install triton*
+pip install torch*
+</code>
+install comfyui packages
+<code | download>
+pip install -r requirements.txt
+</code>
+native packages **to be checked** after rocm 7.2
+<code bash>
+yay -S python-pytorch-rocm python-torchvision-rocm python-torchaudio-rocm
+#uv pip install --pre torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3
+</code>
 ====== Benchmark ======
@@ Line 144: / Line 217: @@
 # This command downloads the 4.92 GB model file directly.
 wget https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
+https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
 # bench
 ./build/bin/llama-bench -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
+./build/bin/llama-bench -m Llama-3.2-3B-Instruct-Q4_K_M.gguf
+</code>
+ollama
+<code bash>
+ollama run llama3.2  --verbose 'explain nuclear fusion'
+</code>
+<code>
+yay -S python-huggingface-hub
+hf download Qwen/Qwen3-VL-8B-Instruct
 </code>

Galileo Labs

User Tools

Site Tools

Differences

Page Tools