Deepthink-1.5B-Open-PRM

Deepthink-1.5B-Open-PRM is a process-supervised reasoning model fine-tuned from Qwen2.5 1.5B using Process Reward Models (PRM). It excels at step-by-step mathematical problem solving in both English and Simplified Chinese, offering interpretable, logically structured responses for use in education, STEM tutoring, and lightweight math agents.

Key Features

Process Reward Model Supervision (PRM)
Fine-tuned with PRMs to reward high-quality intermediate reasoning steps — fostering step-by-step interpretability, accuracy, and educational transparency.
Compact Foundation (Qwen2.5 0.5B)
Built upon the highly efficient Qwen2.5 1.5B architecture and scaled up through distillation and reward-based alignment to 1.5B parameters, balancing reasoning quality and deployment efficiency.
Bilingual Math Capability
Fluent in solving and explaining math problems in both English and Simplified Chinese, making it ideal for multilingual classrooms and tutoring platforms.
Process-Supervised Math Reasoning
Trained to reason like a teacher — showing each logical step before delivering an answer. Ideal for learners who need to understand the “how” and “why” behind each solution.
Long-Context & Word Problem Reasoning
Especially proficient with multi-step arithmetic, word problems, logic puzzles, and middle school to early college-level math.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Deepthink-1.5B-Open-PRM"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Solve: A tank can be filled by one pipe in 6 hours and emptied by another in 9 hours. How long will it take to fill the tank if both pipes are opened together?"

messages = [
    {"role": "system", "content": "You are a helpful math tutor who explains each step clearly."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Intended Use

Math Education Agents: Tutors that explain problems step by step, helping users build understanding through reasoning.
Bilingual Learning Platforms: Apps that teach math in both Chinese and English.
STEM-Oriented Assistants: Supports early-stage problem solving in science and engineering contexts.
Lightweight LLM Deployments: Optimized for low-resource environments, from browsers to mobile devices.

Limitations

Domain Specificity
Primarily tuned for math reasoning — performance may degrade on unrelated tasks like creative writing or open dialogue.
Model Size Constraint
While efficient, 1.5B parameters may struggle with highly abstract or very long multi-domain tasks.
PRM Bias Generalization
PRM training can bias toward rewardable structures — results should still be reviewed for correctness and completeness.
Prompt Structure Sensitivity
Well-structured queries yield more accurate and educationally useful outputs.

prithivMLmods
/

Deepthink-1.5B-Open-PRM

Deepthink-1.5B-Open-PRM

Key Features

Quickstart with Transformers

Intended Use

Limitations

Model tree for prithivMLmods/Deepthink-1.5B-Open-PRM

Collection including prithivMLmods/Deepthink-1.5B-Open-PRM

Edge Device LLM - Advanced RL v3