---
base_model:
- meta-llama/Llama-3.2-3B
---

# LLaMA 3.2 - 3B Medical Fine-Tuned Model

## Model Description
This model is a fine-tuned version of the **LLaMA 3.2 - 3B** large language model, optimized for medical question-answering tasks. The fine-tuning was conducted using **Low-Rank Adaptation (LoRA)** on a **Medical QA dataset**.

## Fine-Tuning Approach
LoRA allows for efficient parameter tuning by introducing trainable low-rank matrices that adapt the model weights without modifying the pre-trained parameters extensively.

### LoRA Rank Selection
- The model was trained with **LoRA rank 4**.
- Higher ranks allow for more detailed adaptations but increase computational overhead.
- The initial training was performed with a lower rank for efficiency, with plans to increase rank in future iterations based on performance evaluations.

## Training Configuration
- **Base Model**: LLaMA 3.2 - 3B
- **Dataset**: Medical Question-Answer dataset
- **Training Framework**: Hugging Face `transformers`
- **Optimizer**: AdamW with `fp16` for mixed precision training
- **Batch Size**: 1 (with gradient accumulation of 4)
- **Epochs**: 1
- **Learning Rate**: 2e-5
- **Evaluation & Logging**: Weights & Biases (W&B) integration
- **Model Sharing**: Pushed to Hugging Face Model Hub

## Performance & Metrics
The training logs (attached below) indicate:
- **Loss Reduction**: Training loss gradually decreases, indicating convergence.
- **Learning Rate Decay**: Follows a linear decay schedule.
- **Gradient Norm Stability**: Controlled within acceptable limits.
- **Global Steps & Epoch Tracking**: The model completes training within the expected steps.

## Usage
To load and use the model:
```python
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

os.environ["HF_KEY"] = "enter key to access your hugging face account in order to import the model"

model_name = "Tanmay3004/llama_medical"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=os.getenv("HF_KEY"))
model = AutoModelForCausalLM.from_pretrained(model_name, use_auth_token=os.getenv("HF_KEY"))

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

template = "Question:\n{question}\n\nAnswer:\n{answer}"

prompt = template.format(
    question="What are the treatments for Paget's Disease of Bone?",
    answer=""
)

def generate_medical_response(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)  # Move inputs to the same device
    outputs = model.generate(**inputs, max_new_tokens=100)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generate_medical_response(prompt))
```

## Future Improvements
- Increase LoRA rank for improved adaptation.
- Further dataset augmentation for broader generalization.
- Optimize inference for real-time applications in medical AI systems.