User Tools

Site Tools


tips:rocm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tips:rocm [2025/12/05 18:09] – [🌐 Heterogeneous Architecture Strategy] sscipionitips:rocm [2025/12/29 23:04] (current) – [3\. ROCm (Optional but Recommended)] sscipioni
Line 1: Line 1:
 +References:
 +- https://community.frame.work/t/amd-strix-halo-llama-cpp-installation-guide-for-fedora-42/75856
 +
 +
 This report outlines the deployment of the **Ollama LLM runtime** on **Arch Linux** specifically tailored for the **AMD Ryzen AI Max+ 395 APU**. The primary focus is optimizing performance by leveraging the integrated **Radeon 8060S iGPU** through the **Vulkan** backend, and considering the potential of the **XDNA 2 NPU** for heterogeneous acceleration. This report outlines the deployment of the **Ollama LLM runtime** on **Arch Linux** specifically tailored for the **AMD Ryzen AI Max+ 395 APU**. The primary focus is optimizing performance by leveraging the integrated **Radeon 8060S iGPU** through the **Vulkan** backend, and considering the potential of the **XDNA 2 NPU** for heterogeneous acceleration.
  
Line 39: Line 43:
 ```bash ```bash
 # Install essential ROCm packages # Install essential ROCm packages
-sudo pacman -S rocm-core amdgpu_top +yay -S rocm-core amdgpu_top rocminfo rocm-gfx1151-bin 
-# Install rocm-hip-sdk if developing or using other tools +yay -S rocm-hip-sdk rocm-opencl-runtime 
-sudo pacman -S rocm-hip-sdk+sudo usermod -a -G render,video $USER 
 + 
 +# xdna 
 +yay -S amdxdna-driver-bin xrt-npu-git 
 ``` ```
 +
 +**IMPORTANT**: add /opt/rocm/bin to PATH
  
 ### 4\. Memory Configuration ### 4\. Memory Configuration
Line 129: Line 139:
  
  
 +
 +====== lemonade-server ======
 +
 +<code | download>
 +yay -S lemonade-server
 +</code>
 +
 +oga-hybrid mode: this splits the work, the NPU handles the prefill (prompt processing), and the iGPU handles the generation.
 +<code | download>
 +lemonade-server run Qwen3-Coder-30B-A3B-Instruct-GGUF --recipe oga-hybrid --llamacpp rocm
 +</code>
 +
 +<code | download>
 +curl http://localhost:8000/api/v1/chat/completions \
 +                    -H "Content-Type: application/json" \
 +                    -d '{
 +                  "model": "Qwen3-Coder-30B-A3B-Instruct-GGUF",
 +                  "messages": [{"role": "user", "content": "Who are you?"}],
 +                  "stream": false
 +                }'
 +curl http://localhost:8000/api/v1/stats
 +</code>
  
 ====== Benchmark ====== ====== Benchmark ======
Line 138: Line 170:
 # Use CMAKE to enable Vulkan # Use CMAKE to enable Vulkan
 cmake -DGGML_VULKAN=ON -B build cmake -DGGML_VULKAN=ON -B build
-cmake --build build --config Release+cmake --build build --config Release -j15
  
 # This command downloads the 4.92 GB model file directly. # This command downloads the 4.92 GB model file directly.
 wget https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf wget https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
 +https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
  
 # bench # bench
 ./build/bin/llama-bench -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf  ./build/bin/llama-bench -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf 
 +./build/bin/llama-bench -m Llama-3.2-3B-Instruct-Q4_K_M.gguf
 +</code>
 +
 +ollama
 +<code bash>
 +ollama run llama3.2  --verbose 'explain nuclear fusion'
 +</code>
 +
 +
 +<code>
 +yay -S python-huggingface-hub
 +hf download Qwen/Qwen3-VL-8B-Instruct
 </code> </code>
  
tips/rocm.1764954572.txt.gz · Last modified: by sscipioni