|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- squad |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text2text-generation |
|
--- |
|
# t5-large fine-tuned to SQuAD for Generating Question+Answer |
|
- Input: `context` (e.g. news article) |
|
- Output: `question <sep> answer` |
|
|
|
The answers in the training data (SQuAD) are highly extractive; therefore, this model will generate **extractive** answers. If you would like to have **abstractive** questions/answers, you can use our model trained on the RACE dataset: https://huggingface.co/potsawee/t5-large-generation-race-QuestionAnswer. |
|
|
|
## Model Details |
|
|
|
t5-large model is fine-tuned to the SQuAD dataset where the input is the context/passage and the output is the question followed by the answer. This is the first component in the question generation pipeline (i.e. `g1`) in our [MQAG paper](https://arxiv.org/abs/2301.12307), |
|
or please refer to the GitHub repo of this project: https://github.com/potsawee/mqag0. |
|
|
|
## How to Use the Model |
|
|
|
Use the code below to get started with the model. You can also set ```do_sample=True``` in ```generate()``` to obtain different question-answer pairs. |
|
|
|
```python |
|
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
>>> tokenizer = AutoTokenizer.from_pretrained("potsawee/t5-large-generation-squad-QuestionAnswer") |
|
>>> model = AutoModelForSeq2SeqLM.from_pretrained("potsawee/t5-large-generation-squad-QuestionAnswer") |
|
|
|
>>> context = r"""Chelsea's mini-revival continued with a third victory in a row as they consigned struggling Leicester City to a fifth consecutive defeat. |
|
Buoyed by their Champions League win over Borussia Dortmund, Chelsea started brightly and Ben Chilwell volleyed in from a tight angle against his old club. |
|
Chelsea's Joao Felix and Leicester's Kiernan Dewsbury-Hall hit the woodwork in the space of two minutes, then Felix had a goal ruled out by the video assistant referee for offside. |
|
Patson Daka rifled home an excellent equaliser after Ricardo Pereira won the ball off the dawdling Felix outside the box. |
|
But Kai Havertz pounced six minutes into first-half injury time with an excellent dinked finish from Enzo Fernandez's clever aerial ball. |
|
Mykhailo Mudryk thought he had his first goal for the Blues after the break but his effort was disallowed for offside. |
|
Mateo Kovacic sealed the win as he volleyed in from Mudryk's header. |
|
The sliding Foxes, who ended with 10 men following Wout Faes' late dismissal for a second booking, now just sit one point outside the relegation zone. |
|
""".replace('\n', ' ') |
|
|
|
>>> inputs = tokenizer(context, return_tensors="pt") |
|
>>> outputs = model.generate(**inputs, max_length=100) |
|
>>> question_answer = tokenizer.decode(outputs[0], skip_special_tokens=False) |
|
>>> question_answer = question_answer.replace(tokenizer.pad_token, "").replace(tokenizer.eos_token, "") |
|
>>> question, answer = question_answer.split(tokenizer.sep_token) |
|
|
|
>>> print("question:", question) |
|
question: Who scored the winner for Chelsea? |
|
>>> print("answer:", answer) |
|
answer: Mateo Kovacic |
|
|
|
``` |
|
|
|
## Generating Distractors (other options in a multiple-choice setup) |
|
|
|
```Context ---> Question + (A) Answer (B) Distractor1 (C) Distractor2 (D) Distractor3``` |
|
|
|
Please refer to our distractor generation model, e.g. https://huggingface.co/potsawee/t5-large-generation-race-Distractor |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{manakul2023mqag, |
|
title={MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization}, |
|
author={Manakul, Potsawee and Liusie, Adian and Gales, Mark JF}, |
|
journal={arXiv preprint arXiv:2301.12307}, |
|
year={2023} |
|
} |
|
``` |