πŸ€– BERT IMDb Sentiment Classifier

A fine-tuned bert-base-uncased model for binary sentiment classification on the IMDb movie reviews dataset.
Trained in Google Colab using Hugging Face Transformers with ~93% test accuracy.


πŸ“Œ Model Details

Model Description

  • Developed by: Shubham Swarnakar
  • Shared by: ShubhamSwarnakar
  • Model type: BERTForSequenceClassification
  • Language(s): English πŸ‡ΊπŸ‡Έ
  • License: Apache-2.0
  • Fine-tuned from: bert-base-uncased

Model Sources


βœ… Uses

Direct Use

Use this model for sentiment analysis on English movie reviews or similar texts.
Returns either a positive or negative classification.

Downstream Use

Can be fine-tuned further for domain-specific sentiment classification tasks.

Out-of-Scope Use

Not designed for:

  • Multilingual sentiment analysis
  • Nuanced emotion detection (e.g., joy, anger, sarcasm)
  • Non-movie domains without re-training

⚠️ Bias, Risks, and Limitations

This model inherits potential biases from:

  • Pretrained BERT weights
  • IMDb dataset (may reflect demographic or cultural skew)

Recommendations

Avoid deploying this model in high-risk applications without auditing or further fine-tuning. Misclassification risk exists, especially with ambiguous or sarcastic text.


πŸš€ How to Get Started

from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="ShubhamSwarnakar/bert-imdb-colab-model")
classifier("This movie was surprisingly entertaining!")




🧠 Training Details
Training Data
Dataset: IMDb Dataset

Format: Binary sentiment (positive = 1, negative = 0)

Training Procedure
Preprocessing: Tokenized with BertTokenizerFast

Epochs: 3

Optimizer: AdamW

Scheduler: Linear LR

Batch size: 8

Trained using Colab with limited GPU resources

πŸ“Š Evaluation
Metrics

Final test accuracy: 93.47%

Results Summary
Epoch	Validation Accuracy
1	        91.80%
2	        92.04%
3	        92.92%

Final test accuracy on held-out IMDb test split: 93.47%

🌱 Environmental Impact
Estimated based on lightweight training:

Hardware Type: Google Colab GPU (T4)

Training Duration: ~2 hours

Cloud Provider: Google

Region: Unknown

Emissions Estimate: ~0.15 kg COβ‚‚eq

Estimate via ML CO2 Impact Calculator

πŸ—οΈ Technical Specifications
Architecture
BERT-base (12-layer, 768-hidden, 12-heads, 110M parameters)

Compute Infrastructure
Hardware: Google Colab with GPU

Software:

Python 3.11

Transformers 4.x

Datasets

PyTorch 2.x

πŸ“š Citation

@misc{shubhamswarnakar_bert_imdb_2025,
  author       = {Shubham Swarnakar},
  title        = {BERT IMDb Sentiment Classifier},
  year         = 2025,
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ShubhamSwarnakar/bert-imdb-colab-model}},
}

πŸ™‹ More Info
For questions or collaboration, contact @ShubhamSwarnakar.
Downloads last month
8
Safetensors
Model size
109M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support