PRM.png

Deepthink-1.5B-Open-PRM

Deepthink-1.5B-Open-PRM is a process-supervised reasoning model fine-tuned from Qwen2.5 1.5B using Process Reward Models (PRM). It excels at step-by-step mathematical problem solving in both English and Simplified Chinese, offering interpretable, logically structured responses for use in education, STEM tutoring, and lightweight math agents.

Key Features

  1. Process Reward Model Supervision (PRM)
    Fine-tuned with PRMs to reward high-quality intermediate reasoning steps — fostering step-by-step interpretability, accuracy, and educational transparency.

  2. Compact Foundation (Qwen2.5 0.5B)
    Built upon the highly efficient Qwen2.5 1.5B architecture and scaled up through distillation and reward-based alignment to 1.5B parameters, balancing reasoning quality and deployment efficiency.

  3. Bilingual Math Capability
    Fluent in solving and explaining math problems in both English and Simplified Chinese, making it ideal for multilingual classrooms and tutoring platforms.

  4. Process-Supervised Math Reasoning
    Trained to reason like a teacher — showing each logical step before delivering an answer. Ideal for learners who need to understand the “how” and “why” behind each solution.

  5. Long-Context & Word Problem Reasoning
    Especially proficient with multi-step arithmetic, word problems, logic puzzles, and middle school to early college-level math.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Deepthink-1.5B-Open-PRM"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Solve: A tank can be filled by one pipe in 6 hours and emptied by another in 9 hours. How long will it take to fill the tank if both pipes are opened together?"

messages = [
    {"role": "system", "content": "You are a helpful math tutor who explains each step clearly."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Intended Use

  • Math Education Agents: Tutors that explain problems step by step, helping users build understanding through reasoning.
  • Bilingual Learning Platforms: Apps that teach math in both Chinese and English.
  • STEM-Oriented Assistants: Supports early-stage problem solving in science and engineering contexts.
  • Lightweight LLM Deployments: Optimized for low-resource environments, from browsers to mobile devices.

Limitations

  1. Domain Specificity
    Primarily tuned for math reasoning — performance may degrade on unrelated tasks like creative writing or open dialogue.

  2. Model Size Constraint
    While efficient, 1.5B parameters may struggle with highly abstract or very long multi-domain tasks.

  3. PRM Bias Generalization
    PRM training can bias toward rewardable structures — results should still be reviewed for correctness and completeness.

  4. Prompt Structure Sensitivity
    Well-structured queries yield more accurate and educationally useful outputs.

Downloads last month
8
Safetensors
Model size
1.78B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/Deepthink-1.5B-Open-PRM

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(675)
this model
Quantizations
2 models

Collection including prithivMLmods/Deepthink-1.5B-Open-PRM