📰 DistilBERT Fine-Tuned on AG News with and without Label Smoothing

This repository provides two fine-tuned DistilBERT models for topic classification on the AG News dataset:

✅ model_no_smoothing: Fine-tuned without label smoothing
🧪 model_label_smoothing: Fine-tuned with label smoothing (smoothing=0.1)

Both models use the same tokenizer (distilbert-base-uncased) and were trained using PyTorch and Hugging Face Trainer.

🧠 Model Details

Model Name	Label Smoothing	Validation Loss	Epochs	Learning Rate
`model_no_smoothing`	❌ No	0.1792	1	2e-5
`model_label_smoothing`	✅ Yes (0.1)	0.5413	1	2e-5

Base model: distilbert-base-uncased
Task: 4-class topic classification
Dataset: AG News (train: 120k, test: 7.6k)

📦 Repository Structure


/
├── model\_no\_smoothing/         # Model A - no smoothing
├── model\_label\_smoothing/      # Model B - label smoothing
├── tokenizer/                  # Tokenizer files (shared)
└── README.md

🧪 How to Use

Load Model A (No Smoothing)

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "Koushim/distilbert-agnews/model_no_smoothing"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

inputs = tokenizer("Breaking news in the tech world!", return_tensors="pt")
outputs = model(**inputs)
pred = outputs.logits.argmax(dim=1).item()

Load Model B (Label Smoothing)

model_name = "Koushim/distilbert-agnews/model_label_smoothing"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

🏷️ Class Labels

World
Sports
Business
Sci/Tech

⚙️ Training Configuration

Framework: PyTorch + 🤗 Transformers
Optimizer: AdamW
Batch size: 16 (train/eval)
Epochs: 1
Learning rate: 2e-5
Max sequence length: 256
Loss: CrossEntropy (custom for smoothing)

📄 License

Apache 2.0

✍️ Author

Hugging Face: Koushim
Trained with transformers.Trainer

Koushim
/

distilbert-agnews