File size: 3,162 Bytes

765ec97
 
 
f427e20
765ec97
f427e20
 
765ec97
796b620
 
f9c717c
 
51f69ce
 
796b620
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51f69ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
796b620
51f69ce
 
 
d6b48ea
51f69ce
d6b48ea
51f69ce
d6b48ea
51f69ce
d6b48ea
51f69ce
d6b48ea
51f69ce
d6b48ea
51f69ce
 
 
 
 
 
 
d6b48ea
51f69ce
d6b48ea
51f69ce
d6b48ea
51f69ce
 
 
5de5077

---
license: apache-2.0
language:
  - en
metrics:
  - accuracy
base_model: distilbert/distilbert-base-uncased
pipeline_tag: text-classification
widget:
  - text: "The product arrived on time and was exactly as described."
library_name: transformers
safetensors: true
---

### Categories:

### label_mapping = {
    "shipping_and_delivery": 0,
    
    "customer_service": 1,
    
    "price_and_value": 2,
    
    "quality_and_performance": 3,
    
    "use_and_design": 4,
    
    "other": 5
}

### Model Description

This fine-tuned DistilBERT model is specifically designed for document classification. It classifies customer feedback into six predefined categories: Shipping and Delivery, Customer Service, Price and Value, Quality and Performance, Use and Design, and Other. By leveraging the transformer-based architecture of DistilBERT, the model efficiently handles the syntactic patterns of text, providing accurate document classification based on content, style, and structure.

- **Model type:** DistilBERT (fine-tuned for text classification)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** distilbert/distilbert-base-uncased

## Bias, Risks, and Limitations

While the model achieves high accuracy across the six categories, it has limitations when dealing with overlapping categories or multiple labels within a single document. The model is designed for single-label classification, meaning it can only detect one label per document. If a document contains features of multiple categories (e.g., both 'Quality and Performance' and 'Price and Value'), the model may struggle to correctly identify both and will predict only one category, potentially leading to misclassification.

### Recommendations

Users (both direct and downstream) should be aware of the model's single-label prediction limitation. In cases where a document contains features of multiple categories, additional models or multi-label classification techniques should be considered.

### Training Data

A custom synthetic dataset was created for this task, focusing on the structural features of text. The dataset provides examples from six categories, helping the model learn from both the syntactic organization and the meaning of the text.

### Training Hyperparameters
Model: distilbert/distilbert-base-uncased

Learning Rate: 3e-5

Epochs: 7

Train Batch Size: 16

Gradient Accumulation Steps: 2

Weight Decay: 0.015

Warm-up Ratio: 0.1

### Evaluation
The model was evaluated using a custom dataset representing the same six document categories. Performance was measured based on accuracy, precision, recall, and F1-score across the categories.

### Metrics
Accuracy: 0.947

Precision: 0.948

Recall: 0.948

F1-Score: 0.948


### For access to the synthetic dataset used, please contact: [[email protected]].

## How to Use:

Here is an example of how to use this model for inference:

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="dnzblgn/Customer-Reviews-Classification")
result = classifier("The product arrived on time and was exactly as described.")
print(result)