File size: 3,870 Bytes
1cc63a2
 
 
 
 
2442baa
1cc63a2
2442baa
1cc63a2
 
 
 
 
2442baa
1cc63a2
2442baa
 
 
 
 
1cc63a2
2442baa
1cc63a2
2442baa
1cc63a2
 
 
 
 
 
2442baa
 
 
 
 
 
 
 
 
1cc63a2
 
 
 
 
2442baa
1cc63a2
 
 
2442baa
1cc63a2
2442baa
 
 
 
1cc63a2
2442baa
1cc63a2
2442baa
1cc63a2
 
 
 
 
 
 
2442baa
1cc63a2
 
 
2442baa
1cc63a2
 
 
2442baa
 
1cc63a2
 
 
2442baa
 
 
1cc63a2
 
 
2442baa
1cc63a2
2442baa
1cc63a2
 
 
 
 
 
 
2442baa
 
1cc63a2
 
 
 
2442baa
1cc63a2
 
 
2442baa
1cc63a2
 
 
 
 
2442baa
 
1cc63a2
 
 
2442baa
 
1cc63a2
2442baa
1cc63a2
 
 
2442baa
 
 
 
 
 
 
 
 
1cc63a2
 
 
2442baa
1cc63a2
2442baa
1cc63a2
2442baa
1cc63a2
2442baa
1cc63a2
2442baa
1cc63a2
 
 
2f3903a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
library_name: transformers
tags: []
---

# Model Card for DistilBERT Text Classification

This is a DistilBERT model fine-tuned for text classification tasks.

## Model Details

### Model Description

This DistilBERT model is fine-tuned for text classification tasks. It is designed to classify texts into different categories based on the provided dataset.

- **Developed by:** Thiago Adriano
- **Model type:** DistilBERT for Sequence Classification
- **Language(s) (NLP):** Portuguese
- **License:** MIT License
- **Finetuned from model:** distilbert-base-uncased

### Model Sources

- **Repository:** [Link to your repository](https://huggingface.co/tadrianonet/distilbert-text-classification)


## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("tadrianonet/distilbert-text-classification")
model = AutoModelForSequenceClassification.from_pretrained("tadrianonet/distilbert-text-classification")

inputs = tokenizer("Sample text for classification", return_tensors="pt")
outputs = model(**inputs)
```

## Training Details

### Training Data

The training data consists of text-label pairs in Portuguese. The data is preprocessed to tokenize the text and convert labels to numerical format.

### Training Procedure

The model is fine-tuned using the Hugging Face `Trainer` API with the following hyperparameters:

- **Training regime:** fp32
- **Learning rate:** 2e-5
- **Batch size:** 16
- **Epochs:** 3

#### Speeds, Sizes, Times

- **Training time:** Approximately 10 minutes on a single GPU

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

The testing data is a separate set of text-label pairs used to evaluate the model's performance.

#### Factors

The evaluation is disaggregated by accuracy and loss.

#### Metrics

- **Accuracy:** Measures the proportion of correct predictions
- **Loss:** Measures the error in the model's predictions

### Results

- **Evaluation Results:**
  - **Loss:** 0.692
  - **Accuracy:** 50%

#### Summary

The model achieves 50% accuracy on the evaluation dataset, indicating that further fine-tuning and evaluation on a more diverse dataset may be necessary.

## Model Examination

[More Information Needed]

## Environmental Impact

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** GPU
- **Hours used:** 0.2 hours
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications

### Model Architecture and Objective

The model is based on DistilBERT, a smaller, faster, and cheaper version of BERT, designed for efficient text classification.

### Compute Infrastructure

#### Hardware

- **Hardware Type:** Single GPU
- **GPU Model:** [More Information Needed]

#### Software

- **Framework:** Transformers 4.x
- **Library:** PyTorch

## Citation

**BibTeX:**

1 ```bibtex
@misc{thiago_adriano_2024_distilbert,
  author = {Thiago Adriano},
  title = {DistilBERT Text Classification},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/tadrianonet/distilbert-text-classification}},
}
1 ```

**APA:**

Thiago Adriano. (2024). DistilBERT Text Classification. Hugging Face. https://huggingface.co/tadrianonet/distilbert-text-classification

## More Information

For more details, visit the [Hugging Face model page](https://huggingface.co/tadrianonet/distilbert-text-classification).

## Model Card Authors

Thiago Adriano

## Model Card Contact

For more information, contact Thiago Adriano at [tadriano.dev@gmail.com]