User Tools

Site Tools


tips:llm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tips:llm [2026/02/14 08:16] sscipionitips:llm [2026/02/15 08:29] (current) sscipioni
Line 30: Line 30:
 ROCM ROCM
 ^ model                  ^ capabilities             ^ size     ^ context  ^ quantization                                                                      ^ eval rate [token/s]  ^ prompt eval rate [token/s]  ^ ^ model                  ^ capabilities             ^ size     ^ context  ^ quantization                                                                      ^ eval rate [token/s]  ^ prompt eval rate [token/s]  ^
 +| llama3.2  | completion tools | "3.2B"   | 131072    | "Q4_K_M" | 52.78 | 1957.30 |
 +| qwen-strixhalo  | completion tools | "30.5B"   | 262144    | "Q4_K_M" | 53.54 | 1056.37 |
 | qwen3-coder  | completion tools | "30.5B"   | 262144    | "Q4_K_M" | 52.10 | 776.55 | | qwen3-coder  | completion tools | "30.5B"   | 262144    | "Q4_K_M" | 52.10 | 776.55 |
-llama3. | completion tools | "3.2B  | 131072    | "Q4_K_M"52.78 1957.30 |+qwen3:30b-a3b  | completion tools thinking | "30.5B"   | 262144    | "Q4_K_M" | 50.19 | 803.06 | 
 +| gpt-oss:20b  | completion tools thinking | "20.9B  | 131072    | "MXFP4" | 45.37 | 519.90 | 
 +| glm-4.7-flash  | completion tools thinking | "29.9B"   | 202752    | "Q4_K_M"41.54 470.09 | 
 +| qwen3:8b  | completion tools thinking | "8.2B"   | 40960    | "Q4_K_M" | 32.68 | 890.98 | 
 +| qwen3-coder-next  | completion tools | "79.7B"   | 262144    | "Q4_K_M" | 33.06 | 380.21 | 
 +| qwen2.5-coder:14b-instruct-q4_K_M  | completion tools insert | "14.8B"   | 32768    | "Q4_K_M" | 17.25 | 527.74 | 
  
 VULKAN VULKAN
Line 37: Line 45:
 | qwen3-coder  | completion tools | "30.5B"   | 262144    | "Q4_K_M" | 54.03 | 805.43 | | qwen3-coder  | completion tools | "30.5B"   | 262144    | "Q4_K_M" | 54.03 | 805.43 |
 | llama3.2  | completion tools | "3.2B"   | 131072    | "Q4_K_M" | 52.54 | 1838.82 | | llama3.2  | completion tools | "3.2B"   | 131072    | "Q4_K_M" | 52.54 | 1838.82 |
 +| gpt-oss:20b  | completion tools thinking | "20.9B"   | 131072    | "MXFP4" | 43.36 | 475.60 |
  
  
 +ollama model
 +<code>
 +FROM qwen3-coder
  
 +# STRIX HALO AGENTIC TUNING
 +PARAMETER num_ctx 128000
 +PARAMETER num_batch 1024
 +PARAMETER num_predict 4096
  
- +SYSTEM """ 
-| llama3.2               | completion tools         "3.2B  | 131072   "Q4_K_M"                                                                          | 88.14                | 715.43                      | +You are a Strix Halo Optimized Coding Agent.  
-| ministral-3:14b        | completion vision tools  | "13.9B"  | 262144   | "Q4_K_M"                                                                          | 23.78                | 302.07                      | +Always use asynchronous patterns and favor memory-efficient algorithms
-| qwen3-coder:30b        | completion tools         | "30.5B"  | 262144   | "Q4_K_M"                                                                          | 73.75                | 72.41                       | +""" 
-| llama3:70b | completion   "70.6B | 8192     "Q4" | 5.55 | 9.72 |  +</code>
-| llava  | completion vision | "7B"   | 32768    | "Q4"  | 49.92                | 207.27                      | +
-| deepseek-coder-v2:16b  | completion insert        | "15.7B"  | 163840   | "Q4"                                                                            | 84.44                | 111.71                      | +
-| bjoernb/qwen3-coder-30b-1m:latest  | completion tools | "30.5B"   | 1048576    | "Q4_K_M" | 74.23 | 94.84 | +
-| freehuntx/qwen3-coder:8b  | completion tools | "8.2B"   | 40960    | "Q4_K_M" | 37.97 | 565.68 | +
-| networkjohnny/deepseek-coder-v2-lite-base-q4_k_m-gguf:latest  | completion tools | "3.2B"   | 131072    | "Q4_K_M" | 86.02 | 1124.53 | +
-| phi4-mini  | completion tools | "3.8B"   | 131072    | "Q4_K_M" | 72.24 | 31.37 | +
-| qwen2.5:7b  | completion tools | "7.6B"   | 32768    | "Q4_K_M" | 42.98 | 153.34 | +
-| llama3.3:70b-instruct-q4_K_M  | completion tools | "70.6B"   | 131072    | "Q4_K_M" | 5.06 | 15.50 | +
-| functiongemma  | completion tools | "268.10M" | 32768 | "Q8" | 364.21 | 240.50 | +
-| danielsheep/Qwen3-Coder-30B-A3B-Instruct-1M-Unsloth  | completion tools | "30.5B"   | 1048576    | "Q4_K_M" | 71.60 | 33.14 | +
-| gpt-oss:20b  | completion tools thinking | "20.9B"   | 131072    | "MXFP4" | 47.32 | 402.47 | +
-| qwen3-coder-next  | completion tools | "79.7B"   | 262144    | "Q4_K_M" | 33.75 | 14.03 | +
  
tips/llm.1771053373.txt.gz · Last modified: by sscipioni