Lota-Carinae-Open-GRPO

Lota-Carinae-Open-GRPO is a chain-of-thought reasoning model fine-tuned from Qwen-1.5B, leveraging an advanced reinforcement learning strategy — Group Relative Policy Optimization (GRPO). It is specifically designed for solving mathematical problems in both English and Chinese, combining stepwise reasoning with lightweight efficiency. Ideal for educational tools, math tutoring systems, and logic-intensive assistants.

Key Features

Chain-of-Thought Math Reasoning
Fine-tuned with GRPO to enhance intermediate step generation, Lota-Carinae-Open-GRPO enables high interpretability and logical transparency — essential for both learning and verification.
Bilingual Proficiency (English + Chinese)
Fluently understands and explains math problems in English and Simplified Chinese, serving diverse educational ecosystems and multilingual environments.
Compact yet Intelligent
Despite its 1.5B parameter size, it achieves strong performance in arithmetic, algebra, geometry, word problems, and logic puzzles, with optimized efficiency via GRPO.
Structured Step-by-Step Computation
Delivers coherent, human-readable step-by-step solutions, making complex problems easier to follow and learn from.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Monoceros-QwenM-1.5B"  # (Update with new repo name if applicable)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Solve: A train travels 180 km in 3 hours. What is its average speed?"
messages = [
    {"role": "system", "content": "You are a helpful tutor skilled in solving math problems with step-by-step explanations."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Intended Use

Math Tutoring Bots: Step-by-step assistants for learners from basic to intermediate levels.
Bilingual Educational Apps: Math learning in English and Chinese, improving access and comprehension.
STEM Reasoning Tools: Supports science, technology, engineering, and logical thinking tasks.
RL-Enhanced Lightweight LLMs: Powered by GRPO, suitable for embedded or resource-constrained deployments (mobile, web, or on-device).

Limitations

Domain Focused:
Primarily optimized for mathematical reasoning; general-purpose tasks may yield reduced quality.
Model Scale:
Smaller size means it may not match the depth of larger models for complex or abstract scenarios.
Inherited Biases:
As it builds upon Qwen-1.5B, it may retain pretraining biases—careful use is advised in sensitive contexts.
Prompt Sensitivity:
Structured, math-specific prompts deliver the most accurate results.

prithivMLmods
/

Lota-Carinae-Open-GRPO

Lota-Carinae-Open-GRPO

Key Features

Quickstart with Transformers

Intended Use

Limitations

Model tree for prithivMLmods/Lota-Carinae-Open-GRPO

Collection including prithivMLmods/Lota-Carinae-Open-GRPO

Edge LLM - Advanced RL