A zero-shot classifier based on bertin-roberta-base-spanish

This model was trained on the basis of the model bertin-roberta-base-spanish using Cross encoder for NLI task. A CrossEncoder takes a sentence pair as input and outputs a label so it learns to predict the labels: "contradiction": 0, "entailment": 1, "neutral": 2.

You can use it with Hugging Face's Zero-shot pipeline to make zero-shot classifications. Given a sentence and an arbitrary set of labels/topics, it will output the likelihood of the sentence belonging to each of the topic.

Usage (HuggingFace Transformers)

The simplest way to use the model is the huggingface transformers pipeline tool. Just initialize the pipeline specifying the task as "zero-shot-classification" and select "hackathon-pln-es/bertin-roberta-base-zeroshot-esnli" as model.

from transformers import pipeline
classifier = pipeline("zero-shot-classification", 
                       model="hackathon-pln-es/bertin-roberta-base-zeroshot-esnli")

classifier(
    "El autor se perfila, a los 50 años de su muerte, como uno de los grandes de su siglo",
    candidate_labels=["cultura", "sociedad", "economia", "salud", "deportes"],
    hypothesis_template="Esta oración es sobre {}."
)

The hypothesis_template parameter is important and should be in Spanish. In the widget on the right, this parameter is set to its default value: "This example is {}.", so different results are expected.

Training

We used sentence-transformers to train the model.

Dataset

We used a collection of datasets of Natural Language Inference as training data:

ESXNLI, only the part in spanish
SNLI, automatically translated
MultiNLI, automatically translated

The whole dataset used is available here.

Authors

Downloads last month: 17

Safetensors

Model size

0.1B params

Tensor type

I64

F32

Dataset used to train somosnlp-hackathon-2022/bertin-roberta-base-zeroshot-esnli

Collection including somosnlp-hackathon-2022/bertin-roberta-base-zeroshot-esnli

#HackathonSomosNLP 22: Cool Projects

Collection

31 items • Updated Oct 12, 2023 • 4