Model Card for Model ID

This model classifies text - specifically twitter posts - on if they contain afxumo (toxicity in english). It has been fine-tuned on a small sample (n=5910) of tweets, which are only slightly imbalanced (n_afxumo=3314, n_not_afxumo=2596). These tweets have been collected on 3 March 2025, for tweets in February 2025. They were collected focused on the Somaliland conflict using queries. The language in the dataset is predominantly Somali, with some English interspersed. It is also slang-heavy as it is social media data.

Model Details

Model Description

Uses

This model is intended for use by (digital) peacebuilders and anyone working towards depolarising the digital public sphere. We have used it in conjunction with Phoenix, an ethical, open source, non-commercial platform designed to enable peacebuilders and mediators to conduct ethical and participatory social media listening in order to inform conflict transformation work. There is a lot of social media data, and this allows peacebuilders to quickly find instances of comments that discourage the dialogue to continue through intimidation of the other participants. The peacebuilder can then use this to inform programming, as well as conduct targeted interventions to mitigate that intimidation.

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

Due to disagreements between annotators on what does and doesn't constitute afxumo, this model was trained on a single annotator's opinion on what is or is not toxic. As such, this model will contain biases of that annotator, and will not give a complete view on toxicity in this context.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

[More Information Needed] Annotators were asked to annotate X/Twitter data with the following rubric. Due to annotation contstraints and low prevalence of some of the labels, all of the labels were coalesced into a single label (afxumo)

· Afxumo – Boost-ku ma ka kooban tahay hadal cay ah, xushmad darro ah, ama comment aan sax ahayn oo ku aadan qof ama koox gaara oo ku kalifta isticmaalaha baraha bulshada inuu ka baxo ama ka tago dooda socota. · Afxumo lafojebis ah (aad u xun)- Boost-ku ma ka koobanyahay hadal nacayb daran ah ama afxumo lafojebis ah oo ku aadan qof ama koox gaara oo ku kalifi karta istcimaalaha baraha bulshada inuu ka baxo ama ka tago dooda socota · Cunsuriyad – Boost-ku ma ka kooban yahay naceyb, nasab cay, iyo cunsuriyad lagu beegsanayo nasabka ama haybta qof ama koox gaara · Cay – Boost-ku ma ka kooban tahay aflagaado, kicin, ama faallo taban oo ku wajahan qof ama koox dad ah · Xadgudub Jinsi – Boost-ku ma ka kooban wax ku saabsan xadugudub jinsi oo lagu beegsanayo qof ama koox gaara sida ayad oo jinsiga ama xubnaha qofka wax laga sheegayo · Habaar– Boost-ku ma ka kooban tahay erayo habaar ah, inkaarid ah, ama hadal kale oo fisqi ah? · Hanjabaad – Boost-ku ma ka koobanyahay hanjabaad, wax yeelid, ama dhibaateyn qof ama koox gaara.

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Downloads last month
18
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for datavaluepeople/Afxumo-toxicity-somaliland-SO

Finetuned
(2)
this model
Quantizations
1 model