llama3-8b-finetuned-ctu

Fine-tuned Llama3-8B model for Can Tho University (CTU) admission consulting chatbot.

Model Description

This is a LoRA adapter fine-tuned from Meta's Llama3-8B on CTU admission data to answer questions about:

  • Admission requirements and procedures
  • Academic programs and majors
  • Tuition fees and scholarships
  • Campus facilities and student services
  • Student life and extracurricular activities

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

# Load base model
base_model_name = "meta-llama/Meta-Llama-3-8B"
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
peft_model_id = "thuanhero1/llama3-8b-finetuned-ctu"
model = PeftModel.from_pretrained(model, peft_model_id)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)

# Format prompt in Llama3 style
prompt = '''<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Bạn là một trợ lý AI hữu ích, được huấn luyện để trả lời các câu hỏi về Đại học Cần Thơ.<|eot_id|><|start_header_id|>user<|end_header_id|>

Điều kiện xét tuyển vào ngành Công nghệ thông tin?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

'''

# Generate response
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
    top_p=0.9
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Alternative Usage (Auto-loading)

If you want to load the model more easily:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# This will automatically load the base model and apply the LoRA adapter
model = AutoModelForCausalLM.from_pretrained(
    "thuanhero1/llama3-8b-finetuned-ctu",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("thuanhero1/llama3-8b-finetuned-ctu")

Training Details

  • Base model: meta-llama/Meta-Llama-3-8B
  • Training data: CTU admission FAQ dataset (Vietnamese)
  • Training method: LoRA fine-tuning
  • LoRA rank: 16 (based on adapter config)
  • LoRA alpha: 32
  • Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
  • Training hardware: NVIDIA GPU
  • Training duration: ~6 hours
  • Parameters: 8B total, ~335M trainable with LoRA

Performance

  • Average response time: ~8-10 tokens/second on T4 GPU
  • Expected performance on A5000: ~25-35 tokens/second
  • Model size: ~335MB (adapter only)

Files Included

  • adapter_config.json: LoRA configuration
  • adapter_model.safetensors: LoRA weights
  • tokenizer.json: Tokenizer vocabulary
  • tokenizer_config.json: Tokenizer configuration
  • special_tokens_map.json: Special tokens mapping
  • chat_template.jinja: Chat template for formatting conversations
  • training_history.csv: Training metrics over time
  • training_summary.json: Final training statistics
  • loss_curves.png: Visualization of training/validation loss

License

This model inherits the Llama 3 Community License. Please review the license terms before use.

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thuanhero1/llama3-8b-finetuned-ctu

Adapter
(643)
this model