llama3-8b-finetuned-ctu
Fine-tuned Llama3-8B model for Can Tho University (CTU) admission consulting chatbot.
Model Description
This is a LoRA adapter fine-tuned from Meta's Llama3-8B on CTU admission data to answer questions about:
- Admission requirements and procedures
- Academic programs and majors
- Tuition fees and scholarships
- Campus facilities and student services
- Student life and extracurricular activities
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch
# Load base model
base_model_name = "meta-llama/Meta-Llama-3-8B"
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA adapter
peft_model_id = "thuanhero1/llama3-8b-finetuned-ctu"
model = PeftModel.from_pretrained(model, peft_model_id)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
# Format prompt in Llama3 style
prompt = '''<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Bạn là một trợ lý AI hữu ích, được huấn luyện để trả lời các câu hỏi về Đại học Cần Thơ.<|eot_id|><|start_header_id|>user<|end_header_id|>
Điều kiện xét tuyển vào ngành Công nghệ thông tin?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
'''
# Generate response
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
top_p=0.9
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Alternative Usage (Auto-loading)
If you want to load the model more easily:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# This will automatically load the base model and apply the LoRA adapter
model = AutoModelForCausalLM.from_pretrained(
"thuanhero1/llama3-8b-finetuned-ctu",
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("thuanhero1/llama3-8b-finetuned-ctu")
Training Details
- Base model: meta-llama/Meta-Llama-3-8B
- Training data: CTU admission FAQ dataset (Vietnamese)
- Training method: LoRA fine-tuning
- LoRA rank: 16 (based on adapter config)
- LoRA alpha: 32
- Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
- Training hardware: NVIDIA GPU
- Training duration: ~6 hours
- Parameters: 8B total, ~335M trainable with LoRA
Performance
- Average response time: ~8-10 tokens/second on T4 GPU
- Expected performance on A5000: ~25-35 tokens/second
- Model size: ~335MB (adapter only)
Files Included
adapter_config.json
: LoRA configurationadapter_model.safetensors
: LoRA weightstokenizer.json
: Tokenizer vocabularytokenizer_config.json
: Tokenizer configurationspecial_tokens_map.json
: Special tokens mappingchat_template.jinja
: Chat template for formatting conversationstraining_history.csv
: Training metrics over timetraining_summary.json
: Final training statisticsloss_curves.png
: Visualization of training/validation loss
License
This model inherits the Llama 3 Community License. Please review the license terms before use.
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for thuanhero1/llama3-8b-finetuned-ctu
Base model
meta-llama/Meta-Llama-3-8B