Differences

This shows you the differences between two versions of the page.

--- tips:llm [2026/02/14 10:30] – sscipioni
+++ tips:llm [2026/04/15 11:25] (current) – sscipioni
@@ Line 31: / Line 31: @@
 ^ model                  ^ capabilities             ^ size     ^ context  ^ quantization                                                                      ^ eval rate [token/s]  ^ prompt eval rate [token/s]  ^
 | llama3.2  | completion tools | "3.2B"   | 131072    | "Q4_K_M" | 52.78 | 1957.30 |
+| qwen-strixhalo  | completion tools | "30.5B"   | 262144    | "Q4_K_M" | 53.54 | 1056.37 |
 | qwen3-coder  | completion tools | "30.5B"   | 262144    | "Q4_K_M" | 52.10 | 776.55 |
 | qwen3:30b-a3b  | completion tools thinking | "30.5B"   | 262144    | "Q4_K_M" | 50.19 | 803.06 |
@@ Line 38: / Line 39: @@
 | qwen3-coder-next  | completion tools | "79.7B"   | 262144    | "Q4_K_M" | 33.06 | 380.21 |
 | qwen2.5-coder:14b-instruct-q4_K_M  | completion tools insert | "14.8B"   | 32768    | "Q4_K_M" | 17.25 | 527.74 |
+| gemma4:latest  | completion vision audio tools thinking | "8.0B"   | 131072    | "Q4_K_M" | 50.15 | 1704.89 |
+| gemma4:e2b  | completion vision audio tools thinking | "5.1B"   | 131072    | "Q4_K_M" | 83.07 | 2799.72 |
+NVIDIA GeForce RTX 3060
+^ model                  ^ capabilities             ^ size     ^ context  ^ quantization                                                                      ^ eval rate [token/s]  ^ prompt eval rate [token/s]  ^
+| gemma4:e2b  | completion vision audio tools thinking | "5.1B"   | 131072    | "Q4_K_M" | 102.44 | 4202.89 |
@@ Line 47: / Line 55: @@
+ollama model
+<code>
+FROM qwen3-coder
+# STRIX HALO AGENTIC TUNING
+PARAMETER num_ctx 128000
+PARAMETER num_batch 1024
+PARAMETER num_predict 4096
+SYSTEM """
+You are a Strix Halo Optimized Coding Agent.
+Always use asynchronous patterns and favor memory-efficient algorithms.
+"""
+</code>

Galileo Labs

User Tools

Site Tools

Differences

Page Tools