Reasoning Models
Collection
1 item
•
Updated
newmindai/QwQ-32B-r1 is a LoRA adapter, fine-tuned via Reinforcement Learning (RL) on top of the base model QwQ-32B
. It incorporates:
This is an adapter, not a fully merged model. To use it, you must load it on top of the base model (
Qwen/QwQ-32B
) using thepeft
library.
QwQ-32B
(Qwen-style transformer)transformers
, trl
, deepspeed
, accelerate
, vllm
Reward Function | Description |
---|---|
math |
Evaluates symbolic math correctness (MathORM) |
accuracy |
Targets numeric accuracy (MathAccuracy) |
format |
Enforces strict formatting constraints |
cosine |
Measures similarity to gold responses |
repetition |
Penalizes repeated or degenerate outputs |
soft_overlong |
Soft penalty for overly long generations |
These were combined and scaled during training with adaptive weighting.
newmindai/simplescaling
): Controls optimizer behavior and reward balance across multiple objectives.🐍 Mezura-SnakeBench Benchmarking
Final performance was benchmarked using the Mezura SnakeBench framework — a standardized evaluation suite developed by NewmindAI for structured Turkish NLP tasks.
This adapter must be loaded on top of the base model Qwen/QwQ-32B
using the peft
library:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model_id = "Qwen/QwQ-32B"
adapter_id = "newmindai/QwQ-32B-r1"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_id)
# Inference
prompt = "Türkiye'nin en yüksek dağı nedir?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))