Multilingual Persuasion Detection in Memes

Given only the “textual content” of a meme, the goal is to identify which of the 20 persuasion techniques, organized in a hierarchy, it uses. Selecting only the ancestor node of a technique gives only a partial reward. This is a hierarchical multi-label classification problem based on the SemEval 2024 Task 4 Subtask 1 of "Multilingual Detection of Persuasion Techniques in Memes".

The source code to train the model along with additional implementations can be found here. The paper describing our method was accepted at SemEval 2024. Link to paper coming soon!!

Hierarchy

Usage Example

Input: "I HATE TRUMP\n\nMOST TERRORIST DO",
Outputs:
- Child-only Label List: ['Name calling/Labeling', 'Loaded Language']
- Complete Hierarchical Label List: ['Ethos', 'Ad Hominem', 'Name calling/Labeling', 'Pathos', 'Loaded Language']

Note:

Make sure to have the dependencies installed in your environment from requirements.txt
Make to have the trained model and tokenizer in the same directory as inference.py

Training Hyperparameters

Base Model: "facebook/mbart-large-50-many-to-many-mmt"
Learning Rate: 5e-05
Max Length: 256
Batch Size: 64
Epoch: 3
Seed: 42

Model Statistics

The model obtained the following metrics on the Development Set as of March 31st, 2024:

Hierarchical F1: 63.58%
Hierarchical Precision: 58.3%
Hierarchical Recall: 69.9%

Licensing

The model is available under the GNU General Public License v3.0 (GPL-3.0), which allows for free use, modification, and distribution under the same license. However, it is strictly for research purposes only and cannot be used for malicious activities, including but not limited to manipulation, targeted harassment, hate speech, deception, and discrimination.

The dataset is available on the competition website. Users must accept an online agreement before downloading and using the data. This agreement stipulates that the data is for research purposes only and cannot be redistributed or used for malicious purposes as outlined above.