README / README.md
fuweiping's picture
Update README.md
e9890a3 verified
|
raw
history blame contribute delete
No virus
402 Bytes

DataSet

A benchmark for multi-dimensional question generation evaluation, which consists of 200 instances from SQuAD and HotpotQA, each instance contains 15 questions generated by 15 different QG models.

Evalutaion dimensions:

  • fluency
  • clarity
  • conciseness
  • relevance
  • consistency
  • answerability
  • answer consistency

Models

Trained QG models used for generating questions to be evaluated.