GGUF
qwen2
unsloth
imatrix
conversational

Please read Running QwQ effectively on sampling issues for QwQ based models.

Or TLDR, use the below settings:

./llama.cpp/llama-cli -hf unsloth/INTELLECT-2-GGUF:Q4_K_XL -ngl 99 \
    --temp 0.6 \
    --repeat-penalty 1.1 \
    --dry-multiplier 0.5 \
    --min-p 0.00 \
    --top-k 40 \
    --top-p 0.95 \
    --samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"

INTELLECT-2

INTELLECT-2 is a 32 billion parameter language model trained through a reinforcement learning run leveraging globally distributed, permissionless GPU resources contributed by the community.

The model was trained using prime-rl, a framework designed for distributed asynchronous RL, using GRPO over verifiable rewards along with modifications for improved training stability. For detailed information on our infrastructure and training recipe, see our technical report.

image/png

Model Information

Usage

INTELLECT-2 is based on the qwen2 architecture, making it compatible with popular libraries and inference engines such as vllm or sglang.

Given that INTELLECT-2 was trained with a length control budget, you will achieve the best results by appending the prompt "Think for 10000 tokens before giving a response." to your instruction. As reported in our technical report, the model did not train for long enough to fully learn the length control objective, which is why results won't differ strongly if you specify lengths other than 10,000. If you wish to do so, you can expect the best results with 2000, 4000, 6000 and 8000, as these were the other target lengths present during training.

Performance

During training, INTELLECT-2 improved upon QwQ in its mathematical and coding abilities. Performance on IFEval slightly decreased, which can likely be attributed to the lack of diverse training data and pure focus on mathematics and coding.

image/png

Model AIME24 AIME25 LiveCodeBench (v5) GPQA-Diamond IFEval
INTELLECT-2 78.8 64.9 67.8 66.8 81.5
QwQ-32B 76.6 64.8 66.1 66.3 83.4
Qwen-R1-Distill-32B 69.9 58.4 55.1 65.2 72.0
Deepseek-R1 78.6 65.1 64.1 71.6 82.7
Downloads last month
149
GGUF
Model size
32.8B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for unsloth/INTELLECT-2-GGUF

Quantized
(9)
this model