Model Card for Qwen3-0.6B-OpenMathReason

Model Description

This model is fine-tuned version of Qwen/Qwen3-0.6B using the Unsloth library and LoRA for parameter-efficient training. This model is trained on two datasets:

unsloth/OpenMathReason-mini — for enhancing mathematical reasoning skills.
mlabonne/FineTome-100k — to improve general conversational abilities.

Model Details

Developed by: Rustam Shiriyev
Language(s) (NLP): English
License: MIT
Finetuned from model: unsloth/Qwen3-0.6B

Uses

Direct Use

This model can be used as a lightweight assistant capable of solving basic to intermediate math problems (OpenMathReason tasks).

Downstream Use

Can be integrated into educational chatbots for STEM learning.

Out-of-Scope Use

Not suitable for high-stakes decision-making.

Bias, Risks, and Limitations

Mathematical reasoning is limited to the scope of the OpenMathReason-mini dataset.
Conversational quality may degrade with complex or multi-turn inputs.

How to Get Started with the Model

from transformers import TextStreamer
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel


login(token="")  

tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-0.6B",)
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Qwen3-0.6B",
    device_map={"": 0}, token=""
)

model = PeftModel.from_pretrained(base_model,"Rustamshry/Qwen3-0.6B-OpenMathReason")

question = "Solve (x + 2)^2 = 0"

messages = [
    {"role" : "user", "content" : question}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, 
    enable_thinking = True,
)

_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to(model.device),
    max_new_tokens = 2048,
    temperature = 0.6, top_p = 0.95, top_k = 20,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

Training Details

Training Data

unsloth/OpenMathReason-mini: 10k+ instruction-following examples focused on math.
mlabonne/FineTome-100k: 100k examples of diverse, high-quality chat data.

Training Procedure

batch size=8,
gradient accumulation steps=2,
optimizer=adamw_torch,
learning rate=2e-5,
warmup steps=100,
fp16=True,
dataloader_num_workers=16,
num_train_epochs=1,
weight_decay=0.01,
lr_scheduler_type = "linear"

Results

Loss Value >> 0.56

Framework versions

PEFT 0.14.0

Rustamshry
/

Qwen3-0.6B-OpenMathReason