ajayat's picture
Update README.md
f2a6ecc verified
metadata
library_name: transformers
datasets:
  - facebook/xnli
metrics:
  - accuracy
base_model:
  - FacebookAI/xlm-roberta-large
license: mit
tags:
  - xlm-roberta
  - finetuning
  - xnli
  - mnli

XLM-RoBERTa Large finetuned on XNLI dataset

Model Details

How to Get Started

This model is ready-to-use for text classification.

import pandas as pd
from transformers import pipeline

# Load the classification pipeline
classifier = pipeline("text-classification", "ajayat/xlm-roberta-large-xnli")
classifier.model.config.id2label = {
    0: "entailment", 
    1: "neutral", 
    2: "contradiction"
}
# Example premise and hypothesis
premise = "A soccer game with multiple males playing."
hypothesis = "Some men are playing a sport."

# Provide input as a dictionary with text and text_pair keys
result = classifier({'text': premise, 'text_pair': hypothesis}, top_k=None)
pd.DataFrame(result)
label score
0 entailment 0.996513
1 neutral 0.003228
2 contradiction 0.000260

Dataset

The XNLI dataset (Cross-lingual Natural Language Inference) is a benchmark dataset created by Facebook AI for evaluating cross-lingual understanding. It extends the MultiNLI corpus by translating 7,500 human-annotated English sentence pairs (premise and hypothesis) into 14 languages.

Each pair is labeled as entailment, contradiction, or neutral.

Training Hyperparameters

  • bf16 mixed precision
  • Batch size per GPU: 64
  • Learning Rate: 2e-5
  • 1 Epoch

Results

Here are the results on the XNLI test set:

lang_abv ar bg de el en es fr hi ru sw th tr ur vi zh avg
accuracy 0.82 0.85 0.85 0.85 0.89 0.85 0.84 0.81 0.83 0.77 0.81 0.82 0.77 0.83 0.83 0.84

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: 4x GPUs NVIDIA A100 SXM4 80GB
  • Hours used: 7 hours
  • Compute Region: France