license: mit
task_categories:
- text-classification
language:
- fr
tags:
- text-classification
- toxicity
- hate-speech
- content-moderation
- chain-of-thought
- curriculum-learning
- nlp
- french-dataset
- classification
pretty_name: ToxiFrench
datasets:
- Naela00/ToxiFrenchFinetuning
base_model:
- Qwen/Qwen3-4B
ToxiFrench Model
This repository contains the ToxiFrench model, a French language model fine-tuned for toxic comment classification. It is based on the Qwen/Qwen3-4B architecture and is designed to detect and classify toxic comments in French text.
We performed a series of experiments to evaluate the model's performance under different fine-tuning configurations, focusing on the impact of data selection strategies and Chain-of-Thought (CoT) annotations.
Finetuning notations
Each experiment follows a naming scheme like: (r/o)(e/d)(a/b)(s/m/l)
Where:
r
= random order,o
= ordered (curriculum)e
= equal toxic/non-toxic,d
= real-world imbalancea
= with CoT finetuning,b
= without CoTs
= small (100),m
= medium (1000),l
= large (all)
e.g.
rdal
is the model trained on the natural distribution of toxicity (d
), on an arbitrary order (r
), with CoT annotations (a
), and on the whole dataset (l
).
If a label like <cot-step>
is present in the checkpoint name, it indicates that the CoT that was used during training did not include this specific reasoning step.
Citation
@misc{toxifrench2025,
title={ToxiFrench},
author={Delaval Axel},
year={2025},
}