facebook
/

opt-iml-1.3b

Text Generation

text-generation-inference

Model card Files Files and versions Community

rpasunuru commited on Jan 26, 2023

Commit

2217fd5

•

1 Parent(s): 25dcde7

Create README.md

Files changed (1) hide show

README.md +60 -0

README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+inference: false
+tags:
+- text-generation
+- opt
+license: other
+commercial: false
+---
+# OPT-IML
+## Model Description
+[OPT-IML (OPT + Instruction Meta-Learning)](https://arxiv.org/abs/2212.12017) is a set of instruction-tuned versions of OPT, on a collection of ~2000 NLP tasks gathered from 8 NLP benchmarks, called OPT-IML Bench.
+We provide two model versions:
+* OPT-IML trained on 1500 tasks with several tasks held-out for purposes of downstream evaluation, and
+* OPT-IML-Max trained on all ~2000 tasks
+### How to use
+You can use this model directly with a pipeline for text generation.
+```python
+>>> from transformers import pipeline
+>>> generator = pipeline('text-generation', model="facebook/opt-iml-1.3b")
+>>> generator("What is the capital of USA?")
+```
+### Limitations and bias
+While OPT-IML models outperform baseline OPT on an extensive set of evaluations,
+nevertheless, they are susceptible to the various risks associated with using large language models
+relating to factual correctness, generation of toxic language and enforcing stereotypes. While we release our
+OPT-IML models to proliferate future work on instruction-tuning and to improve the availability
+of large instruction-tuned causal LMs, the use of these models should be
+accompanied with responsible best practices.
+## Training data
+OPT-IML models are trained on OPT-IML Bench, a large benchmark for Instruction MetaLearning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks include Super-NaturalInstructions, FLAN, PromptSource, etc.
+## Training procedure
+The texts are tokenized using the GPT2 byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
+The 30B model was fine-tuned on 64 40GB A100 GPUs. During fine-tuning, models saw approximately 2 billion tokens, which is only 0.6% of the pre-training
+budget of OPT.
+### BibTeX entry and citation info
+```bibtex
+@misc{iyer2022opt,
+      title={OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization},
+      author={Iyer, Srinivasan and Lin, Xi Victoria and Pasunuru, Ramakanth and Mihaylov, Todor and Simig, D{\'a}niel and Yu, Ping and Shuster, Kurt and Wang, Tianlu and Liu, Qing and Koura, Punit Singh and others},
+      year={2022},
+      eprint={2212.12017},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```