Model Card for Model ID (in progress of completing)
This model is a fine-tunning of BETO uncase to detect offensive and discriminatory language against lgbt community. It could be used as a moderation service in forums and digital spaces.
Model Card Contact
Model Details
Model description process
-Starting recovering of discriminatory phrases for the LGBTQIA+ community from X/Twitter, Instagram and Tiktok (197 phrases) . -Labelling by 3 raters as non-lgbtphobic (0) and lgbtphobic (1). -Text augmentation was applied backtranslation and random synonyms replacing. -Translating to Spanish part of McGiff, J., & Nikolov, N. S. (2024) dataset and added (under licence CC-BY-4.0) -Finally, we obtained 1234 tagged phrases for version 1.0.1 of LGBTQIAphobia_augmented. Please cite data set as:
Martínez-Araneda, C., Maldonado Montiel, D., Gutiérrez Valenzuela, M., Gómez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024). LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166
- Developed by: [Martínez-Araneda, C; Segura Navarrete, A.; Gutierrez Valenzuela, Mariella; Maldonado Mintiel, Diego; Gómez Meneses, P.; Vidal-Castro; Christian ]
- Model type: [text-classification]
- Language(s) (NLP): [Spanish]
- License: [CC-BY-4.0]
- Finetuned from model [dccuchile/bert-base-spanish-wwm-uncased]: More information of base model [https://github.com/dccuchile/beto]
Model Sources [optional]
Uses
This model can be used to detect offensive and discriminatory language against lgbt community. It could be used as a moderation service in forums and digital spaces.
Direct Use
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
This model has its own bias from having been adjusted with a small data set.
[More Information Needed]
Recommendations
How to Get Started with the Model
#libraries from transformers import AutoModelForSequenceClassification, AutoTokenizer
Define la ruta de donde cargarás el modelo
#load_directory = "./lgbetO"
Cargar el modelo entrenado
#model = AutoModelForSequenceClassification.from_pretrained(load_directory)
Cargar el tokenizer
#tokenizer = AutoTokenizer.from_pretrained(load_directory)
Training Details
The training process begins by retrieving offensive/non-offensive and discriminatory/non-discriminatory language against phrases related to the lgbt community from twitter, instagram and tiktok, preprocessing them, labeling them by 3 raters, augmenting them with backtranslation and synonyms, and adjusting the BETO base model (dccuchile/bert-base -spanish-wwm-uncased) for discriminatory phrase detection for the lgbt community.
Training Data
Citation Martínez-Araneda, C., Maldonado Montiel, D., Gutiérrez Valenzuela, M., Gómez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024). LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: Google Cloud Platform [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: southamerica
- Carbon Emitted: 0.14kgCO$_2$eq/kWh
Experiments were conducted using Google Cloud Platform in region southamerica-east1, which has a carbon efficiency of 0.2 kgCO$_2$eq/kWh. A cumulative of 10 hours of computation was performed on hardware of type T4 (TDP of 70W).
Total emissions are estimated to be 0.14 kgCO$_2$eq of which 100 percents were directly offset by the cloud provider.
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
(GPU) del backend de Google Compute Engine en Python 3
Hardware
RAM: 3.87 GB/12.67 GB Disco: 33.96 GB/112.64 GB
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]