Llama-3.2-3B LoRA Fine-tune on OpenHermes

📖 Overview

This model is a LoRA fine-tuned version of meta-llama/Llama-3.2-3B on the OpenHermes dataset. The goal of this run was to adapt Llama-3.2-3B for improved instruction-following using a high-quality, multi-domain SFT dataset.

Fine-tuning was performed with parameter-efficient fine-tuning (PEFT) using LoRA adapters. Only ~0.75% of model parameters were trained, keeping compute and memory usage efficient while still yielding strong gains.

⚙️ Training Configuration

Base Model: meta-llama/Llama-3.2-3B Method: QLoRA (LoRA rank 16, α=32, dropout=0.05) Trainable Parameters: 24.3M / 3.24B (~0.75%)

Training Arguments:

training_args = TrainingArguments(
    output_dir="./llama_finetune_lora",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    num_train_epochs=1,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    weight_decay=0.01,
    logging_steps=200,

    evaluation_strategy="steps",
    eval_steps=200,
    save_strategy="steps",
    save_steps=1000,
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,

    bf16=True,  # A100 support
    fp16=False,
    gradient_checkpointing=True,
    torch_compile=False,
    report_to="none",
    seed=42
)

📊 Training Metrics

Run stopped at 2000 steps (~4.5h on A100). Loss steadily improved and validation stabilized around 0.20.

Step	Training Loss	Validation Loss
200	1.2781	0.2202
400	0.2167	0.2134
600	0.2139	0.2098
800	0.2120	0.2072
1000	0.2085	0.2057
1200	0.1996	0.2043
1400	0.2056	0.2034
1600	0.2016	0.2023
1800	0.2000	0.2012
2000	0.2027	0.2005

📉 Validation loss converged near ~0.20, indicating effective adaptation.

🚀 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "meta-llama/Llama-3.2-3B"
adapter = "kunjcr2/llama3-3b-lora-openhermes"  # replace with your Hub repo

# Load base + adapter
tok = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

# Generate
prompt = "Explain the concept of binary search trees."
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tok.decode(outputs[0], skip_special_tokens=True))

📌 Notes

Training was run with bf16 + gradient checkpointing on A100 (40GB).
Only adapters are uploaded (small size). Use together with the base model.
Repo includes: adapter_model.safetensors, adapter_config.json, tokenizer files, and this README.
Training stopped early at 2000 steps (~17% of planned) due to good convergence.

✨ If you like this model, feel free to try it out and extend training. Future runs could include more steps, preference tuning (DPO/ORPO), or domain-specific mixtures.

kunjcr2
/

llama3-3b-lora-openhermes

Llama-3.2-3B LoRA Fine-tune on OpenHermes

📖 Overview

⚙️ Training Configuration

📊 Training Metrics

🚀 Usage

📌 Notes

Model tree for kunjcr2/llama3-3b-lora-openhermes

Dataset used to train kunjcr2/llama3-3b-lora-openhermes

Space using kunjcr2/llama3-3b-lora-openhermes 1