llama3-8b-finetuned-ctu

Fine-tuned Llama3-8B model for Can Tho University (CTU) admission consulting chatbot.

Model Description

This is a LoRA adapter fine-tuned from Meta's Llama3-8B on CTU admission data to answer questions about:

Admission requirements and procedures
Academic programs and majors
Tuition fees and scholarships
Campus facilities and student services
Student life and extracurricular activities

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

# Load base model
base_model_name = "meta-llama/Meta-Llama-3-8B"
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
peft_model_id = "thuanhero1/llama3-8b-finetuned-ctu"
model = PeftModel.from_pretrained(model, peft_model_id)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)

# Format prompt in Llama3 style
prompt = '''<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Bạn là một trợ lý AI hữu ích, được huấn luyện để trả lời các câu hỏi về Đại học Cần Thơ.<|eot_id|><|start_header_id|>user<|end_header_id|>

Điều kiện xét tuyển vào ngành Công nghệ thông tin?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

'''

# Generate response
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
    top_p=0.9
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Alternative Usage (Auto-loading)

If you want to load the model more easily:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# This will automatically load the base model and apply the LoRA adapter
model = AutoModelForCausalLM.from_pretrained(
    "thuanhero1/llama3-8b-finetuned-ctu",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("thuanhero1/llama3-8b-finetuned-ctu")

Training Details

Base model: meta-llama/Meta-Llama-3-8B
Training data: CTU admission FAQ dataset (Vietnamese)
Training method: LoRA fine-tuning
LoRA rank: 16 (based on adapter config)
LoRA alpha: 32
Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
Training hardware: NVIDIA GPU
Training duration: ~6 hours
Parameters: 8B total, ~335M trainable with LoRA

Performance

Average response time: ~8-10 tokens/second on T4 GPU
Expected performance on A5000: ~25-35 tokens/second
Model size: ~335MB (adapter only)

Files Included

adapter_config.json: LoRA configuration
adapter_model.safetensors: LoRA weights
tokenizer.json: Tokenizer vocabulary
tokenizer_config.json: Tokenizer configuration
special_tokens_map.json: Special tokens mapping
chat_template.jinja: Chat template for formatting conversations
training_history.csv: Training metrics over time
training_summary.json: Final training statistics
loss_curves.png: Visualization of training/validation loss

License

This model inherits the Llama 3 Community License. Please review the license terms before use.

thuanhero1
/

llama3-8b-finetuned-ctu