UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark
#8
by
Fernanda24
- opened
UD-Q4_K_XL :
test_cases: 225
model: openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
edit_format: diff
commit_hash: f38200c
pass_rate_1: 29.8
pass_rate_2: 60.9 << -- this is the final score 60.9%
pass_num_1: 67
pass_num_2: 137
percent_cases_well_formed: 94.7
error_outputs: 12
num_malformed_responses: 12
num_with_malformed_responses: 12
user_asks: 106
lazy_comments: 0
syntax_errors: 0
indentation_errors: 0
exhausted_context_windows: 0
prompt_tokens: 2896195
completion_tokens: 456367
test_timeouts: 1
total_tests: 225
command: aider --model openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
date: 2025-07-23
versions: 0.85.3.dev
seconds_per_case: 120.3 ```
Fernanda24
changed discussion status to
closed
Fernanda24
changed discussion status to
open
Fernanda24
changed discussion title from
UD-Q4_K_XL matches fp8 with 60.9% vs 61.8% on Aider Polyglot benchmark
to UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark
Thanks for sharing your results! Pretty damn cool!