smol-IQ2_KS also passes the official K2 Vendor Verifier test!
After about 5 days of duration... I have the results of running K2-Vendor-Verifier (https://github.com/MoonshotAI/K2-Vendor-Verifier)!
For schema_accuracy, smol-IQ2_KS gets 100% and for tool_call_f1 it scored 76%. MoonshotAI believes that a 'score above 73% is acceptable and can be used as a reference.'
Processed results:
{
"tool_call_trigger_similarity": {
"TP": 798,
"FP": 311,
"FN": 187,
"TN": 704,
"tool_call_precision": 0.7195671776375113,
"tool_call_recall": 0.8101522842639594,
"tool_call_f1": 0.7621776504297995
},
"tool_call_schema_accuracy": {
"count_finish_reason_tool_calls": 1109,
"count_successful_tool_call": 1109,
"schema_accuracy": 1.0
}
}
Inference parameters:
-t 23
-m /home/ai/models/ubergarm/Kimi-K2-Thinking-GGUF/Kimi-K2-Thinking-smol-IQ2_KS.gguf
--alias Kimi-K2-Thinking
--jinja
--host 0.0.0.0
--chat-template-file /home/ai/ik_llama.cpp/models/templates/Kimi-K2-Thinking.jinja
-c 150000 --no-mmap -ngl 999
-ot "blk.(0|1|2|3|4|5|6|7).ffn.=CUDA0"
-ot "blk.(11|12|13|14|15|16|17).ffn.=CUDA1"
-ot "blk.(21|22).ffn.=CUDA2"
-ot "blk.(31|32).ffn.=CUDA3"
-ot exps=CPU
-mg 0 -ub 4096 -b 4096 -mla 3 -amb 1024
--temp 1.0
--min-p 0.01
fyi - I previously tested Kimi-K2 IQ2_KS and it passed: https://huggingface.co/ubergarm/Kimi-K2-Instruct-GGUF/discussions/8
For schema_accuracy, smol-IQ2_KS gets 100% and for tool_call_f1 it scored 76%. MoonshotAI believes that a 'score above 73% is acceptable and can be used as a reference.'
Wow thanks for spending 5 days on your rig verifying these quants! Amazing that even the smol-IQ2_KS is considered within "reference" quality! Score a big win for local LLMs!!!