# DataSet A benchmark for multi-dimensional question generation evaluation, which consists of 200 instances from SQuAD and HotpotQA, each instance contains 15 questions generated by 15 different QG models. Evalutaion dimensions: - fluency - clarity - conciseness - relevance - consistency - answerability - answer consistency # Models Trained QG models used for generating questions to be evaluated.