UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark

#8
by Fernanda24 - opened

UD-Q4_K_XL :

  test_cases: 225
  model: openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
  edit_format: diff
  commit_hash: f38200c
  pass_rate_1: 29.8
  pass_rate_2: 60.9                               << -- this is the final score 60.9%
  pass_num_1: 67
  pass_num_2: 137
  percent_cases_well_formed: 94.7
  error_outputs: 12
  num_malformed_responses: 12
  num_with_malformed_responses: 12
  user_asks: 106
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  prompt_tokens: 2896195
  completion_tokens: 456367
  test_timeouts: 1
  total_tests: 225
  command: aider --model openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
  date: 2025-07-23
  versions: 0.85.3.dev
  seconds_per_case: 120.3 ```
Fernanda24 changed discussion status to closed
Fernanda24 changed discussion status to open
Fernanda24 changed discussion title from UD-Q4_K_XL matches fp8 with 60.9% vs 61.8% on Aider Polyglot benchmark to UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark
Unsloth AI org

Thanks for sharing your results! Pretty damn cool!

Sign up or log in to comment