Differences

This shows you the differences between two versions of the page.

--- tips:rocm [2025/12/05 18:09] – [🌐 Heterogeneous Architecture Strategy] sscipioni
+++ tips:rocm [2025/12/29 23:04] (current) – [3\. ROCm (Optional but Recommended)] sscipioni
@@ Line 1: / Line 1: @@
+References:
+- https://community.frame.work/t/amd-strix-halo-llama-cpp-installation-guide-for-fedora-42/75856
 This report outlines the deployment of the **Ollama LLM runtime** on **Arch Linux** specifically tailored for the **AMD Ryzen AI Max+ 395 APU**. The primary focus is optimizing performance by leveraging the integrated **Radeon 8060S iGPU** through the **Vulkan** backend, and considering the potential of the **XDNA 2 NPU** for heterogeneous acceleration.
@@ Line 39: / Line 43: @@
 ```bash
 # Install essential ROCm packages
-sudo pacman -S rocm-core amdgpu_top
+yay -S rocm-core amdgpu_top rocminfo rocm-gfx1151-bin
-# Install rocm-hip-sdk if developing or using other tools
+yay -S rocm-hip-sdk rocm-opencl-runtime
-# sudo pacman -S rocm-hip-sdk
+sudo usermod -a -G render,video $USER
+# xdna
+yay -S amdxdna-driver-bin xrt-npu-git
 ```
+**IMPORTANT**: add /opt/rocm/bin to PATH
 ### 4\. Memory Configuration
@@ Line 129: / Line 139: @@
+====== lemonade-server ======
+<code | download>
+yay -S lemonade-server
+</code>
+oga-hybrid mode: this splits the work, the NPU handles the prefill (prompt processing), and the iGPU handles the generation.
+<code | download>
+lemonade-server run Qwen3-Coder-30B-A3B-Instruct-GGUF --recipe oga-hybrid --llamacpp rocm
+</code>
+<code | download>
+curl http://localhost:8000/api/v1/chat/completions \
+                    -H "Content-Type: application/json" \
+                    -d '{
+                  "model": "Qwen3-Coder-30B-A3B-Instruct-GGUF",
+                  "messages": [{"role": "user", "content": "Who are you?"}],
+                  "stream": false
+                }'
+curl http://localhost:8000/api/v1/stats
+</code>
 ====== Benchmark ======
@@ Line 138: / Line 170: @@
 # Use CMAKE to enable Vulkan
 cmake -DGGML_VULKAN=ON -B build
-cmake --build build --config Release
+cmake --build build --config Release -j15
 # This command downloads the 4.92 GB model file directly.
 wget https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
+https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
 # bench
 ./build/bin/llama-bench -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
+./build/bin/llama-bench -m Llama-3.2-3B-Instruct-Q4_K_M.gguf
+</code>
+ollama
+<code bash>
+ollama run llama3.2  --verbose 'explain nuclear fusion'
+</code>
+<code>
+yay -S python-huggingface-hub
+hf download Qwen/Qwen3-VL-8B-Instruct
 </code>

Galileo Labs

User Tools

Site Tools

Differences

Page Tools