Llama-3.2-3B LoRA Fine-tune on OpenHermes

πŸ“– Overview

This model is a LoRA fine-tuned version of meta-llama/Llama-3.2-3B on the OpenHermes dataset. The goal of this run was to adapt Llama-3.2-3B for improved instruction-following using a high-quality, multi-domain SFT dataset.

Fine-tuning was performed with parameter-efficient fine-tuning (PEFT) using LoRA adapters. Only ~0.75% of model parameters were trained, keeping compute and memory usage efficient while still yielding strong gains.


βš™οΈ Training Configuration

Base Model: meta-llama/Llama-3.2-3B Method: QLoRA (LoRA rank 16, Ξ±=32, dropout=0.05) Trainable Parameters: 24.3M / 3.24B (~0.75%)

Training Arguments:

training_args = TrainingArguments(
    output_dir="./llama_finetune_lora",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    num_train_epochs=1,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    weight_decay=0.01,
    logging_steps=200,

    evaluation_strategy="steps",
    eval_steps=200,
    save_strategy="steps",
    save_steps=1000,
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,

    bf16=True,  # A100 support
    fp16=False,
    gradient_checkpointing=True,
    torch_compile=False,
    report_to="none",
    seed=42
)

πŸ“Š Training Metrics

Run stopped at 2000 steps (~4.5h on A100). Loss steadily improved and validation stabilized around 0.20.

Step Training Loss Validation Loss
200 1.2781 0.2202
400 0.2167 0.2134
600 0.2139 0.2098
800 0.2120 0.2072
1000 0.2085 0.2057
1200 0.1996 0.2043
1400 0.2056 0.2034
1600 0.2016 0.2023
1800 0.2000 0.2012
2000 0.2027 0.2005

πŸ“‰ Validation loss converged near ~0.20, indicating effective adaptation.


πŸš€ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "meta-llama/Llama-3.2-3B"
adapter = "kunjcr2/llama3-3b-lora-openhermes"  # replace with your Hub repo

# Load base + adapter
tok = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

# Generate
prompt = "Explain the concept of binary search trees."
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tok.decode(outputs[0], skip_special_tokens=True))

πŸ“Œ Notes

  • Training was run with bf16 + gradient checkpointing on A100 (40GB).
  • Only adapters are uploaded (small size). Use together with the base model.
  • Repo includes: adapter_model.safetensors, adapter_config.json, tokenizer files, and this README.
  • Training stopped early at 2000 steps (~17% of planned) due to good convergence.

✨ If you like this model, feel free to try it out and extend training. Future runs could include more steps, preference tuning (DPO/ORPO), or domain-specific mixtures.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for kunjcr2/llama3-3b-lora-openhermes

Finetuned
(320)
this model

Dataset used to train kunjcr2/llama3-3b-lora-openhermes

Space using kunjcr2/llama3-3b-lora-openhermes 1