Model Card for Model ID
This model classifies text - specifically twitter posts - on if they contain afxumo (toxicity in english). It has been fine-tuned on a small sample (n=5910) of tweets, which are only slightly imbalanced (n_afxumo=3314, n_not_afxumo=2596). These tweets have been collected on 3 March 2025, for tweets in February 2025. They were collected focused on the Somaliland conflict using queries. The language in the dataset is predominantly Somali, with some English interspersed. It is also slang-heavy as it is social media data.
Model Details
Model Description
- Developed by: datavaluepeople in conjunction with How to Build UP for a project to find toxic speech.
- Language(s) (NLP): Somali, English
- License: agpl-3.0
- Finetuned from model: xlm-roberta-base-finetuned-somali
Uses
This model is intended for use by (digital) peacebuilders and anyone working towards depolarising the digital public sphere. We have used it in conjunction with Phoenix, an ethical, open source, non-commercial platform designed to enable peacebuilders and mediators to conduct ethical and participatory social media listening in order to inform conflict transformation work. There is a lot of social media data, and this allows peacebuilders to quickly find instances of comments that discourage the dialogue to continue through intimidation of the other participants. The peacebuilder can then use this to inform programming, as well as conduct targeted interventions to mitigate that intimidation.
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
Due to disagreements between annotators on what does and doesn't constitute afxumo, this model was trained on a single annotator's opinion on what is or is not toxic. As such, this model will contain biases of that annotator, and will not give a complete view on toxicity in this context.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
[More Information Needed] Annotators were asked to annotate X/Twitter data with the following rubric. Due to annotation contstraints and low prevalence of some of the labels, all of the labels were coalesced into a single label (afxumo)
· Afxumo – Boost-ku ma ka kooban tahay hadal cay ah, xushmad darro ah, ama comment aan sax ahayn oo ku aadan qof ama koox gaara oo ku kalifta isticmaalaha baraha bulshada inuu ka baxo ama ka tago dooda socota. · Afxumo lafojebis ah (aad u xun)- Boost-ku ma ka koobanyahay hadal nacayb daran ah ama afxumo lafojebis ah oo ku aadan qof ama koox gaara oo ku kalifi karta istcimaalaha baraha bulshada inuu ka baxo ama ka tago dooda socota · Cunsuriyad – Boost-ku ma ka kooban yahay naceyb, nasab cay, iyo cunsuriyad lagu beegsanayo nasabka ama haybta qof ama koox gaara · Cay – Boost-ku ma ka kooban tahay aflagaado, kicin, ama faallo taban oo ku wajahan qof ama koox dad ah · Xadgudub Jinsi – Boost-ku ma ka kooban wax ku saabsan xadugudub jinsi oo lagu beegsanayo qof ama koox gaara sida ayad oo jinsiga ama xubnaha qofka wax laga sheegayo · Habaar– Boost-ku ma ka kooban tahay erayo habaar ah, inkaarid ah, ama hadal kale oo fisqi ah? · Hanjabaad – Boost-ku ma ka koobanyahay hanjabaad, wax yeelid, ama dhibaateyn qof ama koox gaara.
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
- Downloads last month
- 18