DAMO-NLP-SG
/

PMR-xxlarge

Transformers

PyTorch

albert

Model card Files Files and versions Community

xww033 commited on May 22, 2023

Commit

4c71c4a

1 Parent(s): 3ef670d

Update README.md

Browse files

Files changed (1) hide show

README.md +55 -0

README.md CHANGED Viewed

@@ -1,3 +1,58 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---
+# From Clozing to Comprehending: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader
+Pre-trained Machine Reader (PMR) is  pre-trained with 18 million Machine Reading Comprehension (MRC) examples constructed with Wikipedia Hyperlinks.
+It was introduced in the paper From Clozing to Comprehending: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader by
+Weiwen Xu, Xin Li, Wenxuan Zhang, Meng Zhou, Wai Lam, Luo Si, Lidong Bing
+and first released in [this repository](https://github.com/DAMO-NLP-SG/PMR).
+This model is initialized with albert-xxlarge-v2 and further continued pre-trained with an MRC objective.
+## Model description
+The model is pre-trained with distantly labeled data using a learning objective called Wiki Anchor Extraction (WAE).
+Specifically, we constructed a large volume of general-purpose and high-quality MRC-style training data based on Wikipedia anchors (i.e., hyperlinked texts).
+For each Wikipedia anchor, we composed a pair of correlated articles.
+One side of the pair is the Wikipedia article that contains detailed descriptions of the hyperlinked entity, which we defined as the definition article.
+The other side of the pair is the article that mentions the specific anchor text, which we defined as the mention article.
+We composed an MRC-style training instance in which the anchor is the answer,
+the surrounding passage of the anchor in the mention article is the context, and the definition of the anchor entity in the definition article is the query.
+Based on the above data, we then introduced a novel Wiki Anchor Extraction (WAE) problem as the pre-training task of PMR.
+In this task, PMR determines whether the context and the query are relevant.
+If so, PMR extracts the answer from the context that satisfies the query description.
+During fine-tuning, we unified downstream NLU tasks in our MRC formulation, which typically falls into four categories:
+(1) span extraction with pre-defined labels (e.g., NER) in which each task label is treated as a query to search the corresponding answers in the input text (context);
+(2) span extraction with natural questions (e.g., EQA) in which the question is treated as the query for answer extraction from the given passage (context);
+(3) sequence classification with pre-defined task labels, such as sentiment analysis. Each task label is used as a query for the input text (context); and
+(4) sequence classification with natural questions on multiple choices, such as multi-choice QA (MCQA). We treated the concatenation of the question and one choice as the query for the given passage (context).
+Then, in the output space, we tackle span extraction problems by predicting the probability of context span being the answer.
+We tackle sequence classification problems by conducting relevance classification on [CLS] (extracting [CLS] if relevant).
+## Model variations
+There are three versions of models released. The details are:
+| Model | Backbone | #params |
+|------------|-----------|----------|
+|   [PMR-base](https://huggingface.co/DAMO-NLP-SG/PMR-base)    |  [roberta-base](https://huggingface.co/roberta-base)      |  125M    |
+|   [PMR-large](https://huggingface.co/DAMO-NLP-SG/PMR-large)    |    [roberta-large](https://huggingface.co/roberta-large)      | 355M     |
+|   [PMR-xxlarge](https://huggingface.co/DAMO-NLP-SG/PMR-xxlarge)   |  [albert-xxlarge-v2](https://huggingface.co/albert-xxlarge-v2)      |  235M   |
+## Intended uses & limitations
+The models need to be fine-tuned on the data downstream tasks. During fine-tuning, no task-specific layer is required.
+### How to use
+You can try the codes from [this repo](https://github.com/DAMO-NLP-SG/PMR).
+### BibTeX entry and citation info
+```bibtxt
+@article{xu2022clozing,
+  title={From Clozing to Comprehending: Retrofitting Pre-trained Language Model to Pre-trained Machine Reader},
+  author={Xu, Weiwen and Li, Xin and Zhang, Wenxuan and Zhou, Meng and Bing, Lidong and Lam, Wai and Si, Luo},
+  journal={arXiv preprint arXiv:2212.04755},
+  year={2022}
+}
+```