This model beats Qwen Max!

#33
by MrDevolver - opened

Seriously, try this on both Qwen Max and QwQ 32B with reasoning:

Create Mandelbrot fractal simulator with Earth like colors and rendered on a 3D sphere. Use html, css, javascript, single file. Allow movements using mouse dragging and infinite zoom using mouse wheel.

You released a gem! ❤💎

Wait till they release Qwen max qwq

qwq32b is better than qwen2.5-max! LiveBench

image.png

qwq32b is better than qwen2.5-max! LiveBench

image.png

Yep. One could argue that Qwen Max is not a thinking model (those usually perform much better), on the other hand QwQ 32B is a model that you can run on your computer offline, even with no internet connection to reach those big fast online models...

This model is impressive 😍

Also, i don't understand how this model is lower ranked on lmsys chatbot arena than gemma 3 and several other models that can't solve tricky questions. Like for example this one:

You’re given a black box algorithm that takes three integers (x, y, z) and outputs a single integer. Testing reveals: (1, 2, 3) → 5, (4, 5, 6) → 14, (0, 0, 0) → 0, (2, -1, 3) → 4. Hypothesize the algorithm’s rule, explain your reasoning, and predict the output for (7, 3, -2). If multiple rules fit, discuss alternatives.

Correct answer: 8
The algorithm outputs the sum of the three integers minus 1 if all inputs are positive (greater than zero). Otherwise, it simply outputs the sum of the three integers.

And believe me, not many models can solve this :) Only thinking models, for the most part.

I have a suspicion that Chatbot Arena is unfairly compiling its rankings. Thinking model VS non thinking gemma 3 27b, like w** ?

You are right.

The rankings might be biased somehow.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment