모델 카드: Qwen2.5-0.5B-schoolmath_v2

작성자: Seongeon Kim 날짜: 2025-05-27 허브 링크: https://huggingface.co/SeongeonKim/Qwen2.5-0.5B-schoolmath_v2

1. 모델 개요

베이스 모델: unsloth/Qwen2-0.5B
파인튜닝: PEFT LoRA 어댑터 + TRL SFTTrainer
목표: GSM8K(Grade-School Math) 데이터셋의 초중등 수준 수리 문제에 대한 정확도 향상

2. 파인튜닝 설정

LoRA 구성
- 랭크 r = 16
- α (alpha) = 16
- dropout = 0
- 타깃 모듈: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
학습 프레임워크
- Transformers 4.51.0.dev0
- TRL SFTTrainer v0.4.8
- Unsloth FastLanguageModel (4비트 양자화 추론)

학습 데이터

GSM8K train split (7,473문제)

프롬프트 템플릿:

Please solve the following math problem according to these instructions:

1. Whenever you perform a calculation, display it in the form <<expression=result>>.
2. Alternate between a brief explanatory sentence and the corresponding calculation, step by step.
3. On the very last line, output only the final numeric answer in the format #### {{answer}}.

### Problem: {}
### Solution: {}

하이퍼파라미터
- 에폭(epoch) = 3
- 배치 사이즈 = 2 (per_device_train_batch_size)
- gradient_accumulation_steps = 8
- 학습률 = 5e-5
- warmup_ratio = 0.05
- optimizer = adamw_8bit
- scheduler = linear
컴퓨트 환경
- 프로토타입: 1×Tesla T4 (Colab)
- 평가: 4×RTX3090 (vLLM 백엔드)

3. 벤치마크 결과 비교

모델	strict-match (5-shot)	flexible-extract (5-shot)
파인튜닝 모델 Qwen2.5-0.5B-schoolmath_v1	24.56 ± 1.19 %	24.56 ± 1.19 %
베이직 모델 unsloth/Qwen2.5-0.5B	34.34 ± 1.31 %	35.18 ± 1.32 %

해설

엄격 매칭(strict-match): #### <답안> 패턴으로 정확히 일치해야 성공으로 간주

유연 추출(flexible-extract): 숫자만 추출해 매칭(쉼표·달러 기호 제외)

4. 사용 예시

(1) vLLM CLI 평가

pip install lm-evaluation-harness[vllm] vllm transformers

lm_eval \
  --model vllm \
  --model_args pretrained=SeongeonKim/Qwen2.5-0.5B-schoolmath_v1 \
  --tasks gsm8k \
  --device cuda:0 \
  --batch_size auto

(2) Transformers + PEFT 로드

from transformers import AutoTokenizer
from peft import PeftModel
import torch

# 토크나이저 및 LoRA 어댑터 로드
tokenizer = AutoTokenizer.from_pretrained("SeongeonKim/Qwen2.5-0.5B-schoolmath_v1")
base_model = torch.load("unsloth/Qwen2-0.5B")  # 또는 FastLanguageModel 로더
model = PeftModel.from_pretrained(base_model, "SeongeonKim/Qwen2.5-0.5B-schoolmath_v1")

model.eval().to("cuda")
prompt = "Question: If you have 3 apples and buy 2 more, how many apples? Answer:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=16)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

5. Intended Use & Limitations

용도: 연구·교육용으로 GSM8K와 같은 다단계 수리 문제 실험
제한사항
- 5-shot exact-match ≈ 23 %로 인간(≈ 90 %+) 대비 낮음
- 오답·허위 계산(hallucination) 위험
- 학생 과제 자동 채점 등 프로덕션 용도 권장되지 않음

6. 윤리 및 라이선스

라이선스: Apache 2.0
윤리적 고려사항
- 자동 채점 도구로 사용 지양
- 결과 검증 없이 의사결정에 사용 금지

7. 인용

@misc{SeongeonKim2025QwenSchoolMath,
  title        = {Qwen2.5-0.5B-schoolmath\_v1: LoRA-tuned Qwen2.5-0.5B on GSM8K},
  author       = {Seongeon Kim},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/SeongeonKim/Qwen2.5-0.5B-schoolmath_v1}},
}