Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,44 @@ base_model:
|
|
21 |
- Qwen/Qwen3-4B
|
22 |
---
|
23 |
|
24 |
-
# ToxiFrench
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
This repository contains the **ToxiFrench** model, a **French language model** fine-tuned for **toxic comment classification**. It is based on the [**Qwen/Qwen3-4B**](https://huggingface.co/Qwen/Qwen3-4B) architecture and is designed to detect and classify toxic comments in French text.
|
27 |
|
@@ -41,12 +78,23 @@ Where:
|
|
41 |
|
42 |
If a label like `<cot-step>` is present in the checkpoint name, it indicates that the CoT that was used during training did not include this specific reasoning step.
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
## Citation
|
45 |
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
|
|
|
|
51 |
}
|
52 |
```
|
|
|
21 |
- Qwen/Qwen3-4B
|
22 |
---
|
23 |
|
24 |
+
# ToxiFrench: Benchmarking and Investigating SLMs and CoT Finetuning for French Toxicity Detection
|
25 |
+
|
26 |
+
<!-- Badges/Tags -->
|
27 |
+
[](https://axeldlv00.github.io/ToxiFrench/)
|
28 |
+
[](https://github.com/AxelDlv00/ToxiFrench)
|
29 |
+
[](https://huggingface.co/datasets/Naela00/ToxiFrenchFinetuning)
|
30 |
+
[](./LICENSE)
|
31 |
+
|
32 |
+
**Author:** Axel Delaval
|
33 |
+
|
34 |
+
**Affiliations:** École Polytechnique & Shanghai Jiao Tong University (SJTU)
|
35 |
+
|
36 |
+
**Email:** [[email protected]](mailto:[email protected])
|
37 |
+
|
38 |
+
---
|
39 |
+
|
40 |
+
> ⚠️ **Content Warning**
|
41 |
+
> This project and the associated dataset contain examples of text that may be considered offensive, toxic, or otherwise disturbing. The content is presented for research purposes only.
|
42 |
+
|
43 |
+
---
|
44 |
+
|
45 |
+
## Abstract
|
46 |
+
|
47 |
+
Despite significant progress in English toxicity detection, performance drastically degrades in other languages like French, a gap stemming from disparities in training corpora and the culturally nuanced nature of toxicity. This paper addresses this critical gap with three key contributions. First, we introduce ToxiFrench, a new public benchmark dataset for French toxicity detection, comprising 53,622 entries. This dataset was constructed using a novel annotation strategy that required manual labeling for only 10% of the data, minimizing effort and error. Second, we conducted a comprehensive evaluation of toxicity detection models. Our findings reveal that while Large Language Models (LLMs) often achieve high performance, Small Language Models (SLMs) can demonstrate greater robustness to bias, better cross-language consistency, and superior generalization to novel forms of toxicity. Third, to identify optimal transfer-learning methods, we conducted a systematic comparison of In-Context Learning (ICL), Supervised Fine-tuning (SFT), and Chain-of-Thought (CoT) reasoning using `Qwen3-4B` and analyzed the impact of data imbalance. We propose a novel approach for CoT fine-tuning that employs a dynamic weighted loss function, significantly boosting performance by ensuring the model's reasoning is faithful to its final conclusion.
|
48 |
+
|
49 |
+
---
|
50 |
+
|
51 |
+
## Key Contributions
|
52 |
+
|
53 |
+
* **Dataset and benchmark:** Introduction of ToxiFrench, a new public benchmark dataset for French toxicity detection (53,622 entries).
|
54 |
+
* **Evaluation state-of-the-art detectors:** Extensive evaluation of LLMs (`GPT-4o`, `DeepSeek`, `Gemini`, `Mistral`, ...), SLMs (`Qwen`, `Gemma`, `Mistral`, ...), Transformers (`CamemBERT`, `DistilBERT`, ...), and moderation APIs (`Perspective API`, `OpenAI moderation`, `Mistral moderation`, ...), showing that **SLMs outperform LLMs** in robustness to bias, cross-language consistency, and generalization to novel toxicity forms.
|
55 |
+
* **Transfer learning strategies:** Systematic comparison of ICL, SFT, and CoT reasoning.
|
56 |
+
* **Model development:** Development of a **state-of-the-art 4B SLM** for French toxicity detection that outperforms several powerful LLMs based on the `Qwen3-4B` model.
|
57 |
+
* **CoT fine-tuning:** Introduction of a *novel* approach for CoT fine-tuning that employs a **dynamic weighted loss function**, significantly boosting performance by ensuring the model's reasoning is *faithful* to its final conclusion.
|
58 |
+
|
59 |
+
---
|
60 |
+
|
61 |
+
## Models overview
|
62 |
|
63 |
This repository contains the **ToxiFrench** model, a **French language model** fine-tuned for **toxic comment classification**. It is based on the [**Qwen/Qwen3-4B**](https://huggingface.co/Qwen/Qwen3-4B) architecture and is designed to detect and classify toxic comments in French text.
|
64 |
|
|
|
78 |
|
79 |
If a label like `<cot-step>` is present in the checkpoint name, it indicates that the CoT that was used during training did not include this specific reasoning step.
|
80 |
|
81 |
+
|
82 |
+
---
|
83 |
+
|
84 |
+
## License
|
85 |
+
|
86 |
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
87 |
+
|
88 |
+
---
|
89 |
+
|
90 |
## Citation
|
91 |
|
92 |
+
If you use this project in your research, please cite it as follows:
|
93 |
+
|
94 |
+
```bibtex
|
95 |
+
@misc{delaval2025toxifrench,
|
96 |
+
title={ToxiFrench: Benchmarking and Investigating SLMs and CoT Finetuning for French Toxicity Detection},
|
97 |
+
author={Axel Delaval},
|
98 |
+
year={2025},
|
99 |
}
|
100 |
```
|