GASING Qwen3 1.7B - Curriculum Learning

This model was trained using curriculum learning on the GASING dataset for Indonesian mathematical reasoning.

Model Details

  • Base Model: unsloth/Qwen3-1.7B
  • Training Method: Curriculum Learning (6 epochs with progressive difficulty)
  • Dataset: GASING (Indonesian mathematical problems)
  • Fine-tuning: LoRA (r=8, alpha=32) โ†’ Merged to full weights

Training Results

  • Best Training Loss: 0.0026 (Epoch 6)
  • Training Strategy: Progressive difficulty curriculum

Curriculum Schedule

Epoch Easy Medium Hard
1 5% 0% 0%
2 30% 65% 5%
3 10% 80% 10%
4 5% 80% 15%
5 5% 75% 20%
6 5% 70% 25%

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Cbgcbg/gasing-qwen3-1.7b-curriculum-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "Cbgcbg/gasing-qwen3-1.7b-curriculum-v1",
    trust_remote_code=True
)

# Example usage
question = "Bagaimana cara mencari panjang sisi segitiga jika diketahui sudut alpha dan sisi miringnya 1?"
messages = [
    {"role": "system", "content": "Mulai sekarang anda adalah AI Asisten bernama 'GASING'..."},
    {"role": "user", "content": question}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        temperature=0.7,
        do_sample=True
    )

response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)

Training Configuration

  • Learning Rate: 0.0001
  • Batch Size: 16
  • Gradient Accumulation: 8
  • LoRA r: 8
  • LoRA alpha: 32
  • Max Sequence Length: 8192

Created by Institut Teknologi Del (IT Del)

Downloads last month
18
Safetensors
Model size
1.72B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Cbgcbg/gasing-qwen3-1.7b-curriculum-v1

Finetuned
Qwen/Qwen3-1.7B
Finetuned
unsloth/Qwen3-1.7B
Finetuned
(86)
this model