User Tools

Site Tools


tips:llm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tips:llm [2026/02/14 09:20] sscipionitips:llm [2026/02/15 08:29] (current) sscipioni
Line 31: Line 31:
 ^ model                  ^ capabilities             ^ size     ^ context  ^ quantization                                                                      ^ eval rate [token/s]  ^ prompt eval rate [token/s]  ^ ^ model                  ^ capabilities             ^ size     ^ context  ^ quantization                                                                      ^ eval rate [token/s]  ^ prompt eval rate [token/s]  ^
 | llama3.2  | completion tools | "3.2B"   | 131072    | "Q4_K_M" | 52.78 | 1957.30 | | llama3.2  | completion tools | "3.2B"   | 131072    | "Q4_K_M" | 52.78 | 1957.30 |
 +| qwen-strixhalo  | completion tools | "30.5B"   | 262144    | "Q4_K_M" | 53.54 | 1056.37 |
 +| qwen3-coder  | completion tools | "30.5B"   | 262144    | "Q4_K_M" | 52.10 | 776.55 |
 | qwen3:30b-a3b  | completion tools thinking | "30.5B"   | 262144    | "Q4_K_M" | 50.19 | 803.06 | | qwen3:30b-a3b  | completion tools thinking | "30.5B"   | 262144    | "Q4_K_M" | 50.19 | 803.06 |
 | gpt-oss:20b  | completion tools thinking | "20.9B"   | 131072    | "MXFP4" | 45.37 | 519.90 | | gpt-oss:20b  | completion tools thinking | "20.9B"   | 131072    | "MXFP4" | 45.37 | 519.90 |
-qwen3-coder  | completion tools | "30.5B  262144    | "Q4_K_M"52.10 776.55 |+glm-4.7-flash  | completion tools thinking | "29.9B  202752    | "Q4_K_M"41.54 470.09 |
 | qwen3:8b  | completion tools thinking | "8.2B"   | 40960    | "Q4_K_M" | 32.68 | 890.98 | | qwen3:8b  | completion tools thinking | "8.2B"   | 40960    | "Q4_K_M" | 32.68 | 890.98 |
 | qwen3-coder-next  | completion tools | "79.7B"   | 262144    | "Q4_K_M" | 33.06 | 380.21 | | qwen3-coder-next  | completion tools | "79.7B"   | 262144    | "Q4_K_M" | 33.06 | 380.21 |
-glm-4.7-flash  | completion tools thinking | "29.9B  202752    | "Q4_K_M"38.46 485.27 |+qwen2.5-coder:14b-instruct-q4_K_M  | completion tools insert | "14.8B  32768    | "Q4_K_M"17.25 527.74 |
  
  
Line 46: Line 48:
  
  
 +ollama model
 +<code>
 +FROM qwen3-coder
 +
 +# STRIX HALO AGENTIC TUNING
 +PARAMETER num_ctx 128000
 +PARAMETER num_batch 1024
 +PARAMETER num_predict 4096
  
 +SYSTEM """
 +You are a Strix Halo Optimized Coding Agent. 
 +Always use asynchronous patterns and favor memory-efficient algorithms.
 +"""
 +</code>
  
tips/llm.1771057244.txt.gz · Last modified: by sscipioni