User Tools

Site Tools


tips:rocm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tips:rocm [2025/12/10 19:12] โ€“ [Benchmark] sscipionitips:rocm [2026/02/15 08:33] (current) โ€“ [1\. Kernel and Firmware] sscipioni
Line 1: Line 1:
 +References:
 +- https://community.frame.work/t/amd-strix-halo-llama-cpp-installation-guide-for-fedora-42/75856
 +- https://strix-halo-toolboxes.com/
 +- https://github.com/kyuz0/amd-strix-halo-toolboxes
 +
 This report outlines the deployment of the **Ollama LLM runtime** on **Arch Linux** specifically tailored for the **AMD Ryzen AI Max+ 395 APU**. The primary focus is optimizing performance by leveraging the integrated **Radeon 8060S iGPU** through the **Vulkan** backend, and considering the potential of the **XDNA 2 NPU** for heterogeneous acceleration. This report outlines the deployment of the **Ollama LLM runtime** on **Arch Linux** specifically tailored for the **AMD Ryzen AI Max+ 395 APU**. The primary focus is optimizing performance by leveraging the integrated **Radeon 8060S iGPU** through the **Vulkan** backend, and considering the potential of the **XDNA 2 NPU** for heterogeneous acceleration.
  
Line 15: Line 20:
 Before deploying Ollama, the base Arch Linux installation must have the correct drivers and utilities to fully expose the APU's capabilities, especially for Vulkan and unified memory management. Before deploying Ollama, the base Arch Linux installation must have the correct drivers and utilities to fully expose the APU's capabilities, especially for Vulkan and unified memory management.
 ### 1\. Kernel and Firmware ### 1\. Kernel and Firmware
 +
 +Set in bios UMA buffer size to 8G or lower.
 +
  
 Ensure the system is running a recent kernel (e.g., $6.10+$ or later) for optimal Zen 5 and RDNA 3.5 support. Ensure the system is running a recent kernel (e.g., $6.10+$ or later) for optimal Zen 5 and RDNA 3.5 support.
Line 25: Line 33:
 sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
 ``` ```
 +
 +add to kernel parameters
 +```
 +iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856
 +```
 +
 ### 2\. Graphics and Compute Drivers (Vulkan) ### 2\. Graphics and Compute Drivers (Vulkan)
  
Line 33: Line 47:
 sudo pacman -S mesa vulkan-radeon lib32-vulkan-radeon vulkan-headers sudo pacman -S mesa vulkan-radeon lib32-vulkan-radeon vulkan-headers
 ``` ```
 +`
 ### 3\. ROCm (Optional but Recommended) ### 3\. ROCm (Optional but Recommended)
  
Line 39: Line 54:
 ```bash ```bash
 # Install essential ROCm packages # Install essential ROCm packages
-yay -S rocm-core amdgpu_top rocminfo +yay -S rocm-core amdgpu_top rocminfo rocm-device-libs
 yay -S rocm-hip-sdk rocm-opencl-runtime yay -S rocm-hip-sdk rocm-opencl-runtime
 sudo usermod -a -G render,video $USER sudo usermod -a -G render,video $USER
 +
 +# xdna
 +yay -S amdxdna-driver-bin xrt-npu-git
 +
 ``` ```
  
Line 132: Line 151:
  
  
 +====== lemonade-server ======
 +
 +<code | download>
 +yay -S lemonade-server
 +</code>
 +
 +oga-hybrid mode: this splits the work, the NPU handles the prefill (prompt processing), and the iGPU handles the generation.
 +<code | download>
 +lemonade-server run Qwen3-Coder-30B-A3B-Instruct-GGUF --recipe oga-hybrid --llamacpp rocm
 +</code>
 +
 +<code | download>
 +curl http://localhost:8000/api/v1/chat/completions \
 +                    -H "Content-Type: application/json" \
 +                    -d '{
 +                  "model": "Qwen3-Coder-30B-A3B-Instruct-GGUF",
 +                  "messages": [{"role": "user", "content": "Who are you?"}],
 +                  "stream": false
 +                }'
 +curl http://localhost:8000/api/v1/stats
 +</code>
 +
 +
 +====== Comfyui ======
 +
 +pypi packages
 +
 +download from https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/ and install triton, torch, torchvision and torchaudio
 +<code | download>
 +cd wheels
 +wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torch-2.10.0.dev20260111%2Brocm7.2.0.lw.gitdea53f5b-cp312-cp312-linux_x86_64.whl
 +wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torchaudio-2.10.0%2Brocm7.2.0.git3b0e7a6f-cp312-cp312-linux_x86_64.whl
 +wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torchvision-0.25.0%2Brocm7.2.0.gitaa35ca19-cp312-cp312-linux_x86_64.whl
 +wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/triton-3.5.1%2Brocm7.2.0.gita272dfa8-cp312-cp312-linux_x86_64.whl
 +
 +# install 
 +pip install triton*
 +pip install torch*
 +
 +</code>
 +
 +install comfyui packages
 +<code | download>
 +pip install -r requirements.txt
 +</code>
 +
 +
 +
 +
 +native packages **to be checked** after rocm 7.2
 +<code bash>
 +yay -S python-pytorch-rocm python-torchvision-rocm python-torchaudio-rocm
 +#uv pip install --pre torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3
 +</code>
 ====== Benchmark ====== ====== Benchmark ======
  
Line 144: Line 217:
 # This command downloads the 4.92 GB model file directly. # This command downloads the 4.92 GB model file directly.
 wget https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf wget https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
 +https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
  
 # bench # bench
 ./build/bin/llama-bench -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf  ./build/bin/llama-bench -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf 
 +./build/bin/llama-bench -m Llama-3.2-3B-Instruct-Q4_K_M.gguf
 </code> </code>
  
tips/rocm.1765390349.txt.gz ยท Last modified: by sscipioni