RoBERTa Fine-Tuned Review Classifier
This is a fine-tuned RoBERTa model built for binary sentiment classification of Amazon product reviews. The model classifies reviews into LABEL_0
(negative) or LABEL_1
(positive), and has been integrated into a Chrome extension that predicts numeric ratings based on review text.
Model Details
Model Description
- Model: roberta-base
- Task: Binary sentiment classification
- Labels:
LABEL_0
: Negative review → (Rating 1–4)LABEL_1
: Positive review → (Rating 7–10)
- Language: English
- License: Apache 2.0
- Author: Prajjwal Chouhan
- Model repo: https://huggingface.co/prajjwal888/roberta-finetuned-review-classifier
Model Sources
- Base Model:
roberta-base
- Fine-tuning & Deployment: Hugging Face + FastAPI (on Render)
- Demo Frontend: Chrome extension (injects rating predictions on Amazon product pages)
Use Cases
Direct Use
You can use this model to predict sentiment or approximate ratings of customer reviews. It's ideal for:
- Product feedback classification
- Chrome extensions or browser tools
- E-commerce dashboards
Downstream Use
Can be used in:
- Recommender systems
- Review authenticity/fraud detection
- Customer satisfaction prediction
Out-of-Scope Use
- Reviews in non-English languages
- Sarcastic or ambiguous tone detection
- Fine-grained star rating (e.g., 3 vs. 4)
Bias, Risks & Limitations
Bias
Model may inherit biases from training data—especially in underrepresented product categories or reviewer demographics.
Limitations
- Struggles with sarcastic or short reviews.
- Works only on English-language text.
- Predictions may be unreliable for very long reviews (truncated at 512 tokens).
Recommendations
- Do not use this model for making critical business decisions without human verification.
- Fine-tune on domain-specific reviews if required.
How to Get Started
from transformers import pipeline
classifier = pipeline("text-classification", model="prajjwal888/roberta-finetuned-review-classifier")
result = classifier("The product quality is fantastic. Loved it!")
print(result)
Example Output:
[{"label": "LABEL_1", "score": 0.9987}]
Training Details
Dataset
Custom dataset scraped and labeled from Amazon product reviews. Labeled into two categories based on review sentiment (not star ratings).
Preprocessing
- Lowercasing
- Removal of HTML and special characters
- Truncated to 512 tokens
Training Hyperparameters
Hyperparameter | Value |
---|---|
Epochs | 3 |
Batch size | 16 |
Max length | 512 |
Optimizer | AdamW |
Learning rate | 2e-5 |
Precision | fp16 |
Evaluation
Metrics
Metric | Value |
---|---|
Accuracy | ~91% |
F1 Score | ~90.5% |
Precision | ~90% |
Recall | ~91% |
Evaluation was performed on a 20% held-out validation set from the same distribution.
Environmental Impact
- Hardware Used: NVIDIA T4 GPU
- Platform: Google Colab + Render
- Training Duration: ~1.5 hours
- Estimated CO₂ Emissions: ~0.3 kg (based on ML CO2 Impact Calculator)
Technical Specifications
- Model Type: Transformer Encoder (RoBERTa)
- Architecture: 12-layer, 768-hidden, 12-heads, ~125M parameters
- Framework: PyTorch (via
transformers
)
Citation
@misc{prajjwal888_review_classifier_2024, title={RoBERTa Fine-Tuned Review Classifier}, author={Prajjwal Chouhan}, year={2024}, howpublished={\url{https://huggingface.co/prajjwal888/roberta-finetuned-review-classifier}}, }
Contact
- GitHub: @prajjwal888
- LinkedIn: Prajjwal Chouhan
Acknowledgments
Thanks to the Hugging Face community and the creators of roberta-base
. This project is inspired by practical applications of NLP in e-commerce.
- Downloads last month
- 51