Visualize in Weights & Biases Visualize in Weights & Biases

SmolLM2-360M-Instruct-TaiwanChat

This model is a fine-tuned version of unsloth/SmolLM2-360M-Instruct on the TaiwanChat dataset using Unsloth’s 4-bit quantization and LoRA adapters for efficient instruction-following in Traditional Chinese.

Installation

pip install -r requirements.txt

Requirements

  • Python: 3.8 or higher
  • CUDA: 11.0 or higher (for GPU support)
  • All other dependencies and exact versions are specified in requirements.txt.

Model description

  • Base: SmolLM2-360M-Instruct (360M parameters)
  • Quantization: 4-bit weight quantization (activations in full precision)
  • Adapters: LoRA with rank r=16, alpha α=16, dropout 0.0, applied to projection layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) citeturn2file0
  • Dataset: TaiwanChat (yentinglin/TaiwanChat) — 600 k filtered examples, max length 512, streamed and deduplicated, then split 90% train / 10% validation citeturn2file0

Intended uses & limitations

Intended uses:

  • Conversational AI and chatbots handling Traditional Chinese queries (e.g., weather, FAQs).
  • Instruction-following in a dialogue format.

Limitations:

  • Limited capacity may cause occasional hallucinations or vague answers.
  • Performance measured on a 10% hold-out; real-world data discrepancies may impact quality.
  • Quantization and adapter-based tuning trade off some accuracy for efficiency.

Training procedure

  1. Data preparation

    • Streamed 600 k examples from HF dataset, filtered to max_len=512, cleaned assistant markers via regex, then shuffled and split with Dataset.train_test_split(test_size=0.1) citeturn2file0
  2. Model & training setup

    • Loaded base with FastLanguageModel.from_pretrained(..., load_in_4bit=True, full_finetuning=False)
    • Applied LoRA adapters via FastLanguageModel.get_peft_model(...)
    • Used LoggingSFTTrainer subclass to catch empty-label and NaN-loss cases during eval citeturn2file0
  3. Hyperparameters

    Parameter Value
    num_train_epochs 3
    per_device_train_batch_size 40
    gradient_accumulation_steps 1
    per_device_eval_batch_size 1
    learning_rate 2e-4
    weight_decay 0.01
    warmup_steps 500
    max_seq_length 512
    evaluation_strategy steps (every 100)
    eval_steps 100
    save_strategy steps (every 1000)
    logging_steps 50
    optimizer adamw_8bit
    gradient_checkpointing false
    seed 3407
    EarlyStoppingCallback patience 4 evals
  4. Training & push

    • Ran trainer.train(), merged LoRA weights, then pushed the merged 16-bit model to Luigi/SmolLM2-360M-Instruct-TaiwanChat on Hugging Face via model.push_to_hub_merged() citeturn2file0

Example inference

from transformers import AutoTokenizer
from peft import PeftModel

# Load merged model
tokenizer = AutoTokenizer.from_pretrained("Luigi/SmolLM2-360M-Instruct-TaiwanChat")
model = PeftModel.from_pretrained(
    "Luigi/SmolLM2-360M-Instruct-TaiwanChat",
    torch_dtype=torch.float16,
).eval().to("cuda")

# Query
test_prompt = "請問台北今天的天氣如何?"
inputs = tokenizer(test_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.8,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Framework versions

bitsandbytes==0.45.5
datasets==3.2.0
hatchet==1.4.0
importlib_metadata==8.6.1
lit==18.1.8
matplotlib
numpy
packaging
pandas
psutil==6.1.1
pybind11==2.13.6
pytest==8.1.1
redis==6.0.0
scipy
setuptools==70.3.0
Sphinx
sphinx_gallery
sphinx_rtd_theme
tabulate==0.9.0
torch==2.7.0
transformers==4.47.1
trl==0.15.2
unsloth==2025.4.1
unsloth_zoo==2025.4.2
cut_cross_entropy
wandb
wheel==0.45.1
Downloads last month
0
Safetensors
Model size
362M params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Luigi/SmolLM2-360M-Instruct-TaiwanChat 1