Overview

newmindai/QwQ-32B-r1 is a LoRA adapter, fine-tuned via Reinforcement Learning (RL) on top of the base model QwQ-32B. It incorporates:

  • ORMs (Open Reward Modules)
  • DAPO (Decoder Appearance Optimization)
  • SimpleScaling (Multi-objective loss balancing)

This is an adapter, not a fully merged model. To use it, you must load it on top of the base model (Qwen/QwQ-32B) using the peft library.


Training Setup

Base Model

  • Architecture: QwQ-32B (Qwen-style transformer)
  • Libraries: transformers, trl, deepspeed, accelerate, vllm
  • Tokenizer: Custom-trained (compatible with Hugging Face format)

Reward Modules (ORMs)

Reward Function Description
math Evaluates symbolic math correctness (MathORM)
accuracy Targets numeric accuracy (MathAccuracy)
format Enforces strict formatting constraints
cosine Measures similarity to gold responses
repetition Penalizes repeated or degenerate outputs
soft_overlong Soft penalty for overly long generations

These were combined and scaled during training with adaptive weighting.

Scaling Techniques

  • DAPO (Appearance Optimization): Regularizes attention and layout structure in decoder outputs.
  • SimpleScaling (newmindai/simplescaling): Controls optimizer behavior and reward balance across multiple objectives.

Training Regime

  • Stage 1 (Wait #1): Model explores reward landscape; initial rewards unstable.
  • Stage 2 (Wait #2): Convergence improves as ORM signals align.
  • Aha Moment: Clear gains in math and formatting scores around ~2K steps after warm-up.

Evaluation

🐍 Mezura-SnakeBench Benchmarking
Final performance was benchmarked using the Mezura SnakeBench framework — a standardized evaluation suite developed by NewmindAI for structured Turkish NLP tasks.


Usage Example (LoRA Adapter)

This adapter must be loaded on top of the base model Qwen/QwQ-32B using the peft library:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model_id = "Qwen/QwQ-32B"
adapter_id = "newmindai/QwQ-32B-r1"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_id)

# Inference
prompt = "Türkiye'nin en yüksek dağı nedir?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using newmindai/QwQ-32B-r1 1

Collection including newmindai/QwQ-32B-r1