smol-IQ2_KS also passes the official K2 Vendor Verifier test!

#15
by whoisjeremylam - opened

After about 5 days of duration... I have the results of running K2-Vendor-Verifier (https://github.com/MoonshotAI/K2-Vendor-Verifier)!

For schema_accuracy, smol-IQ2_KS gets 100% and for tool_call_f1 it scored 76%. MoonshotAI believes that a 'score above 73% is acceptable and can be used as a reference.'

Processed results:

{
  "tool_call_trigger_similarity": {
    "TP": 798,
    "FP": 311,
    "FN": 187,
    "TN": 704,
    "tool_call_precision": 0.7195671776375113,
    "tool_call_recall": 0.8101522842639594,
    "tool_call_f1": 0.7621776504297995
  },
  "tool_call_schema_accuracy": {
    "count_finish_reason_tool_calls": 1109,
    "count_successful_tool_call": 1109,
    "schema_accuracy": 1.0
  }
}

Inference parameters:

      -t 23
      -m /home/ai/models/ubergarm/Kimi-K2-Thinking-GGUF/Kimi-K2-Thinking-smol-IQ2_KS.gguf
      --alias Kimi-K2-Thinking
      --jinja
      --host 0.0.0.0
      --chat-template-file /home/ai/ik_llama.cpp/models/templates/Kimi-K2-Thinking.jinja
      -c 150000 --no-mmap -ngl 999
      -ot "blk.(0|1|2|3|4|5|6|7).ffn.=CUDA0"
      -ot "blk.(11|12|13|14|15|16|17).ffn.=CUDA1"
      -ot "blk.(21|22).ffn.=CUDA2"
      -ot "blk.(31|32).ffn.=CUDA3"
      -ot exps=CPU
      -mg 0 -ub 4096 -b 4096 -mla 3 -amb 1024
      --temp 1.0
      --min-p 0.01

fyi - I previously tested Kimi-K2 IQ2_KS and it passed: https://huggingface.co/ubergarm/Kimi-K2-Instruct-GGUF/discussions/8

@whoisjeremylam

For schema_accuracy, smol-IQ2_KS gets 100% and for tool_call_f1 it scored 76%. MoonshotAI believes that a 'score above 73% is acceptable and can be used as a reference.'

Wow thanks for spending 5 days on your rig verifying these quants! Amazing that even the smol-IQ2_KS is considered within "reference" quality! Score a big win for local LLMs!!!

Sign up or log in to comment