--- library_name: transformers license: apache-2.0 base_model: distilbert-base-uncased tags: - generated_from_trainer - text-classification metrics: - accuracy - precision - recall - f1 model-index: - name: amazon-reviews-sentiment-distilbert-base-uncased results: [] datasets: - jhan21/amazon-reviews-tokenized-distilbert-3labels pipeline_tag: text-classification --- # Amazon-Beauty-Product-Reviews-distilBERT-base for Sentiment Analysis ## Table of Contents - [Model Details](#model-details) - [Uses](#uses) - [Risks, Limitations and Biases](#risks-limitations-and-biases) - [Training and Evaluation](#training-and-evaluation) ## Model Details #### Model Description This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on a balanced subset of [Amazon beauty reviews dataset](https://huggingface.co/datasets/jhan21/amazon-reviews-tokenized-distilbert-3labels). It achieves the following results on the evaluation set: - Loss: 0.5171 - Accuracy: 0.7862 - Precision: 0.7876 - Recall: 0.7860 - F1: 0.7867 #### Developer Information - **Developed by:** Jiali Han - **Model Type:** Text Classification - **Language(s):** English - **License:** Apache-2.0 - **Parent Model:** For more details about DistilBERT, please check out [this model card](https://huggingface.co/distilbert-base-uncased). - **Resources for more information:** - [Model Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/distilbert#transformers.DistilBertForSequenceClassification) - [DistilBERT paper](https://arxiv.org/abs/1910.01108) ## Uses #### Direct Application This model can be used for sentiment analysis on Amazon beauty product reviews. #### Misuse and Out-of-scope Use The model should not be used to create hostile or alienating environments for people intentionally. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. ## Risks, Limitations and Biases The model may produce biased predictions, particularly impacting underrepresented groups. Users should evaluate the model’s risks for their specific use cases. For further bias evaluation, consider datasets such as: - [WinoBias](https://huggingface.co/datasets/wino_bias) - [WinoGender](https://huggingface.co/datasets/super_glue) - [Stereoset](https://huggingface.co/datasets/stereoset). ## Training and Evaluation #### Training Data The author uses the [Amazon beauty reviews dataset](https://huggingface.co/datasets/jhan21/amazon-reviews-tokenized-distilbert-3labels), which has been balanced to address class imbalance issues. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 0 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 1 ### Training results For detailed training logs, please refer to the [Tensorboard](https://huggingface.co/jhan21/amazon-reviews-sentiment-distilbert-base-uncased/tensorboard) page. | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | |:-------------:|:------:|:-----:|:---------------:|:--------:|:---------:|:------:|:------:| | 0.7283 | 0.0299 | 500 | 0.6867 | 0.7073 | 0.7038 | 0.7071 | 0.7030 | | 0.6718 | 0.0598 | 1000 | 0.6067 | 0.7340 | 0.7478 | 0.7340 | 0.7377 | | 0.6473 | 0.0898 | 1500 | 0.6154 | 0.7390 | 0.7508 | 0.7390 | 0.7416 | | 0.616 | 0.1197 | 2000 | 0.6448 | 0.7423 | 0.7373 | 0.7420 | 0.7377 | | 0.6123 | 0.1496 | 2500 | 0.6286 | 0.7241 | 0.7677 | 0.7243 | 0.7284 | | 0.5874 | 0.1795 | 3000 | 0.5774 | 0.7516 | 0.7539 | 0.7515 | 0.7523 | | 0.5746 | 0.2095 | 3500 | 0.5708 | 0.7564 | 0.7636 | 0.7563 | 0.7582 | | 0.5917 | 0.2394 | 4000 | 0.5839 | 0.7596 | 0.7602 | 0.7595 | 0.7598 | | 0.5774 | 0.2693 | 4500 | 0.6225 | 0.7526 | 0.7482 | 0.7524 | 0.7492 | | 0.594 | 0.2992 | 5000 | 0.5531 | 0.7662 | 0.7694 | 0.7661 | 0.7673 | | 0.5591 | 0.3292 | 5500 | 0.5770 | 0.7665 | 0.7645 | 0.7663 | 0.7645 | | 0.5548 | 0.3591 | 6000 | 0.5805 | 0.7613 | 0.7579 | 0.7611 | 0.7584 | | 0.5742 | 0.3890 | 6500 | 0.5592 | 0.7639 | 0.7665 | 0.7638 | 0.7636 | | 0.5374 | 0.4189 | 7000 | 0.5548 | 0.7712 | 0.7776 | 0.7711 | 0.7735 | | 0.5488 | 0.4489 | 7500 | 0.5622 | 0.7747 | 0.7747 | 0.7745 | 0.7746 | | 0.5557 | 0.4788 | 8000 | 0.5698 | 0.7642 | 0.7822 | 0.7643 | 0.7670 | | 0.556 | 0.5087 | 8500 | 0.5380 | 0.7754 | 0.7777 | 0.7753 | 0.7764 | | 0.5325 | 0.5386 | 9000 | 0.5791 | 0.7754 | 0.7746 | 0.7751 | 0.7736 | | 0.5301 | 0.5686 | 9500 | 0.5569 | 0.7753 | 0.7738 | 0.7751 | 0.7744 | | 0.5232 | 0.5985 | 10000 | 0.5391 | 0.7782 | 0.7806 | 0.7780 | 0.7789 | | 0.5462 | 0.6284 | 10500 | 0.5499 | 0.7729 | 0.7698 | 0.7726 | 0.7683 | | 0.5614 | 0.6583 | 11000 | 0.5243 | 0.7803 | 0.7818 | 0.7801 | 0.7808 | | 0.5376 | 0.6883 | 11500 | 0.5406 | 0.7795 | 0.7772 | 0.7794 | 0.7780 | | 0.5287 | 0.7182 | 12000 | 0.5227 | 0.7797 | 0.7852 | 0.7796 | 0.7806 | | 0.5149 | 0.7481 | 12500 | 0.5423 | 0.7803 | 0.7788 | 0.7801 | 0.7792 | | 0.5312 | 0.7780 | 13000 | 0.5338 | 0.7771 | 0.7860 | 0.7771 | 0.7781 | | 0.5204 | 0.8079 | 13500 | 0.5183 | 0.7843 | 0.7857 | 0.7841 | 0.7849 | | 0.5412 | 0.8379 | 14000 | 0.5192 | 0.7844 | 0.7893 | 0.7843 | 0.7860 | | 0.515 | 0.8678 | 14500 | 0.5135 | 0.7845 | 0.7858 | 0.7843 | 0.7850 | | 0.5033 | 0.8977 | 15000 | 0.5254 | 0.7862 | 0.7882 | 0.7860 | 0.7870 | | 0.5023 | 0.9276 | 15500 | 0.5251 | 0.7863 | 0.7853 | 0.7861 | 0.7856 | | 0.5042 | 0.9576 | 16000 | 0.5215 | 0.7865 | 0.7864 | 0.7864 | 0.7864 | | 0.5237 | 0.9875 | 16500 | 0.5171 | 0.7862 | 0.7876 | 0.7860 | 0.7867 | ### Evaluation Results The fine-tuned DistilBERT model was evaluated on a dataset with the following splits: - Training Samples: 133,665 - Validation Samples: 33,417 The evaluation was conducted on a three-class sentiment classification task. Below are the detailed results: #### Classification Report | Label | Precision | Recall | F1-Score | Support | |---------------|-----------|--------|----------|---------| | 0 | 0.78 | 0.78 | 0.78 | 11163 | | 1 | 0.69 | 0.70 | 0.69 | 11099 | | 2 | 0.89 | 0.87 | 0.88 | 11155 | | **Accuracy** | | | 0.78 | 33417 | | **Macro Avg** | 0.79 | 0.78 | 0.78 | 33417 | | **Weighted Avg** | 0.79 | 0.78 | 0.79 | 33417 | #### Confusion Matrix | | 0 | 1 | 2 | |:-:|:------:|:------:|:------:| | 0 | 8672 | 2331 | 160 | | 1 | 2292 | 7793 | 1014 | | 2 | 169 | 1237 | 9749 | ### Framework versions - Transformers 4.50.3 - Pytorch 2.6.0+cu124 - Tokenizers 0.21.1