Model Card: TinyLlama Math Reasoning Assistant(raw)

Model Description

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 specialized for solving mathematical problems with step-by-step reasoning. It uses a structured format with XML tags to separate reasoning steps from the final answer.

Model Details

Base Model: TinyLlama-1.1B-Chat-v1.0
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Approach: Supervised Fine-Tuning followed by GRPO (Group Relative Policy Optimization)
Developed By: pierizvi
Model Type: Causal Language Model (decoder-only)
License: MIT
Repository: Pierizvi/infused-reasoning-tinyllama-math

Intended Use

This model is designed to:

Solve basic arithmetic problems
Handle word problems requiring multi-step reasoning
Show step-by-step calculation processes
Verify calculation correctness
Provide structured responses with reasoning and answers

The model is intended for educational purposes, homework assistance, and demonstrating mathematical reasoning processes.

Response Format

The model generates responses in a consistent format:

<reasoning>
Step-by-step solution process with calculations
</reasoning>
<answer>The final answer</answer>

This structured format makes it easy to distinguish between the reasoning process and the final answer.

Capabilities

The model can handle:

Basic arithmetic operations (addition, subtraction, multiplication, division)
Word problems involving rates, proportions, and percentages
Multi-step reasoning problems
Problems requiring formula application (e.g., distance = speed × time)

Training Methodology

The model was trained in multiple phases:

Format Learning: Initial training focused on teaching the model to use the XML tag format
Calculation Training: Subsequent training focused on accurate mathematical calculations
Reasoning Enhancement: Final training emphasized natural reasoning approaches before calculations

Training utilized custom reward functions that evaluated:

Format adherence
Reasoning quality
Calculation accuracy
Mathematical consistency

Limitations

Small Base Model: As a 1.1B parameter model, it has limited knowledge compared to larger models
Calculation Complexity: May struggle with complex or multi-step calculations
Domain Specificity: Primarily focused on elementary and middle-school level mathematics
Memory Requirements: Requires careful memory optimization when running on CPU environments

Usage Examples

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
import re

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load tokenizer and adapter
tokenizer = AutoTokenizer.from_pretrained("Pierizvi/infused-reasoning-tinyllama-math")
model = PeftModel.from_pretrained(base_model, "Pierizvi/infused-reasoning-tinyllama-math")

# Define function to solve problems
def solve_problem(question):
    system_prompt = """You MUST respond using ONLY this exact format:

<reasoning>
Think through the problem step by step.
Show your calculations clearly.
</reasoning>
<answer>Your final answer here</answer>"""
    
    prompt = f"{system_prompt}\n\nQuestion: {question}\n\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs["input_ids"],
            max_new_tokens=256,
            temperature=0.5
        )
    
    response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return response

# Example usage
print(solve_problem("If a train travels at 60 miles per hour for 2.5 hours, how far will it travel?"))

Technical Specifications

LoRA Configuration:
- Rank: 16
- Alpha: 32
- Target Modules: q_proj, k_proj, v_proj, o_proj
- Dropout: 0.05
Memory Optimization:
- Works with 4-bit and 8-bit quantization for CPU deployment
- Can be run on spaces with 16GB RAM with proper optimization

Ethical Considerations

The model should not be used for high-stakes mathematical calculations where errors could have serious consequences
The reasoning steps should be verified by users and not taken as definitive mathematical proofs
The model is intended as a learning aid and not a replacement for proper mathematical education

Acknowledgments

Thanks to the TinyLlama team for the base model
Inspired by DeepSeek's approaches to reasoning model training
Developed using the Hugging Face ecosystem and PEFT library

Citation

If you use this model in research, please cite:

@misc{infused-reasoning-tinyllama-math,
  author = {pierizvi},
  title = {TinyLlama Math Reasoning Assistant},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/Pierizvi/infused-reasoning-tinyllama-math}}
}