ixambert-base-cased finetuned for QA

This is a basic implementation of the multilingual model "ixambert-base-cased", fine-tuned on SQuAD v1.1 and an experimental version of SQuAD1.1 in Basque (1/3 size of original SQuAD1.1), that is able to answer basic factual questions in English, Spanish and Basque.

Overview

  • Language model: ixambert-base-cased
  • Languages: English, Spanish and Basque
  • Downstream task: Extractive QA
  • Training data: SQuAD v1.1 + experimental SQuAD1.1 in Basque
  • Eval data: SQuAD v1.1 + experimental SQuAD1.1 in Basque
  • Infrastructure: 1x GeForce RTX 2080

Outputs

The model outputs the answer to the question, the start and end positions of the answer in the original context, and a score for the probability for that span of text to be the correct answer. For example:

{'score': 0.9667195081710815, 'start': 101, 'end': 105, 'answer': '1820'}

How to use

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "MarcBrun/ixambert-finetuned-squad-eu-en"

# To get predictions
context = "Florence Nightingale, known for being the founder of modern nursing, was born in Florence, Italy, in 1820"
question = "When was Florence Nightingale born?"
qa = pipeline("question-answering", model=model_name, tokenizer=model_name)
pred = qa(question=question,context=context)

# To load the model and tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Hyperparameters

batch_size = 8
n_epochs = 3
learning_rate = 2e-5
optimizer = AdamW
lr_schedule = linear
max_seq_len = 384
doc_stride = 128
Downloads last month
84
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train MarcBrun/ixambert-finetuned-squad-eu-en

Space using MarcBrun/ixambert-finetuned-squad-eu-en 1