Llama-3.2-3B LoRA Fine-tune on OpenHermes
π Overview
This model is a LoRA fine-tuned version of meta-llama/Llama-3.2-3B
on the OpenHermes dataset.
The goal of this run was to adapt Llama-3.2-3B for improved instruction-following using a high-quality, multi-domain SFT dataset.
Fine-tuning was performed with parameter-efficient fine-tuning (PEFT) using LoRA adapters. Only ~0.75% of model parameters were trained, keeping compute and memory usage efficient while still yielding strong gains.
βοΈ Training Configuration
Base Model: meta-llama/Llama-3.2-3B
Method: QLoRA (LoRA rank 16, Ξ±=32, dropout=0.05)
Trainable Parameters: 24.3M / 3.24B (~0.75%)
Training Arguments:
training_args = TrainingArguments(
output_dir="./llama_finetune_lora",
per_device_train_batch_size=2,
gradient_accumulation_steps=8,
learning_rate=2e-4,
num_train_epochs=1,
lr_scheduler_type="cosine",
warmup_ratio=0.03,
weight_decay=0.01,
logging_steps=200,
evaluation_strategy="steps",
eval_steps=200,
save_strategy="steps",
save_steps=1000,
save_total_limit=2,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False,
bf16=True, # A100 support
fp16=False,
gradient_checkpointing=True,
torch_compile=False,
report_to="none",
seed=42
)
π Training Metrics
Run stopped at 2000 steps (~4.5h on A100). Loss steadily improved and validation stabilized around 0.20.
Step | Training Loss | Validation Loss |
---|---|---|
200 | 1.2781 | 0.2202 |
400 | 0.2167 | 0.2134 |
600 | 0.2139 | 0.2098 |
800 | 0.2120 | 0.2072 |
1000 | 0.2085 | 0.2057 |
1200 | 0.1996 | 0.2043 |
1400 | 0.2056 | 0.2034 |
1600 | 0.2016 | 0.2023 |
1800 | 0.2000 | 0.2012 |
2000 | 0.2027 | 0.2005 |
π Validation loss converged near ~0.20, indicating effective adaptation.
π Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = "meta-llama/Llama-3.2-3B"
adapter = "kunjcr2/llama3-3b-lora-openhermes" # replace with your Hub repo
# Load base + adapter
tok = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
# Generate
prompt = "Explain the concept of binary search trees."
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tok.decode(outputs[0], skip_special_tokens=True))
π Notes
- Training was run with bf16 + gradient checkpointing on A100 (40GB).
- Only adapters are uploaded (small size). Use together with the base model.
- Repo includes:
adapter_model.safetensors
,adapter_config.json
, tokenizer files, and this README. - Training stopped early at 2000 steps (~17% of planned) due to good convergence.
β¨ If you like this model, feel free to try it out and extend training. Future runs could include more steps, preference tuning (DPO/ORPO), or domain-specific mixtures.
Model tree for kunjcr2/llama3-3b-lora-openhermes
Base model
meta-llama/Llama-3.2-3B