Spaces:

Tzktz
/

Dit-document-layout-analysis

Sleeping

App Files Files Community

Dit-document-layout-analysis / unilm /edgelm /README.md

Tzktz

Upload 7664 files

6fc683c verified about 1 year ago

preview code

raw

history blame contribute delete

5.18 kB

	# [EdgeFormer](https://arxiv.org/abs/2202.07959)

	EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation

	[EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation](https://arxiv.org/abs/2202.07959). Tao Ge and Furu Wei

	- March 2022: release code and pretrained checkpoints.

	---

	## Pretrained Models

	- [EdgeFormer (Adapter-LA)](https://msranlp.blob.core.windows.net/edgeformer/v1/edgeformer_lora32_pretrain_checkpoint_250k.pt): #enc-dec=12-2; #hidden=512; #head=8; #enc-FFN=2048, #dec-FFN=128, #LoRA-r=32 (#parameters: 11M)
	- [Vocabulary](https://msranlp.blob.core.windows.net/edgeformer/v1/dict.src.txt) and [Sentencepiece-model](https://msranlp.blob.core.windows.net/edgeformer/v1/spm2k-fy22.model)
	- EdgeFormer can be finetuned to support seq2seq generation in English (by now).


	## Downstream seq2seq tasks

	We evaluate EdgeFormer on the benchmarks of three popular seq2seq tasks: CoNLL-14 for GEC, XSUM for Abstractive Summarization, and SQuAD-NQG for Question Generation.

	[CoNLL-14](https://aclanthology.org/W14-1701.pdf)

	\| Model \| #Params \|#FLOPS\|F0.5\|
	\|-----------\|-------------\|-----------\|-----------\|
	\| [Transformer-base](https://arxiv.org/abs/1706.03762) \| 44M \| 1.8G \| 50.1 \|
	\| Pretrained 12+2 [Universal Transformer](https://arxiv.org/abs/1807.03819) \| 7.4M \| 1.4G \| 51.3 \|
	\| Pretrained 12+2 Universal Transformer (wide) \| 9.4M \| 1.9G \| 51.7 \|
	\| Pretrained EdgeFormer \| 9.4M \| 1.3G \| 52.7 \|

	[XSUM](https://arxiv.org/pdf/1808.08745.pdf)

	\| Model \| #Params \|#FLOPS\|ROUGE-1\|ROUGE-2\|ROUGE-L\|
	\|-----------\|-------------\|-----------\|-----------\|-----------\|-----------\|
	\| Transformer-base \| 44M \| 1.8G \| 31.2 \| 10.7 \| 24.9 \|
	\| Pretrained 12+2 Universal Transformer \| 7.4M \| 1.4G \| 34.4 \| 13.4 \| 27.9 \|
	\| Pretrained 12+2 Universal Transformer (wide) \| 9.4M \| 1.9G \| 35.1 \| 14.0 \| 28.6 \|
	\| Pretrained EdgeFormer \| 9.4M \| 1.3G \| 36.3 \| 14.8 \| 29.5 \|

	[SQuAD-NQG](https://arxiv.org/abs/1705.00106)

	\| Model \| #Params \|#FLOPS\|B4\|MTR\|ROUGE-L\|
	\|-----------\|-------------\|-----------\|-----------\|-----------\|-----------\|
	\| Transformer-base \| 44M \| 1.8G \| 2.6 \| 9.0 \| 26.0\|
	\| Pretrained 12+2 Universal Transformer \| 7.4M \| 1.4G \| 18.3 \| 21.0 \| 45.9 \|
	\| Pretrained 12+2 Universal Transformer (wide) \| 9.4M \| 1.9G \| 18.7 \| 21.3 \| 46.1 \|
	\| Pretrained EdgeFormer \| 9.4M \| 1.3G \| 19.0 \| 21.7 \| 46.3 \|


	## Setup

	pip install --editable ./
	```

	## Fine-tuning
	```bash
	PRETRAINED_MODEL=/path/to/checkpoint/model.pt
	fairseq-train /path/to/binarized/data \
	--restore-file $PRETRAINED_MODEL --reset-lr-scheduler --reset-optimizer --reset-dataloader \
	--task translation \
	--criterion label_smoothed_cross_entropy \
	--arch transformer_edge --encoder-layers 12 --decoder-ffn-embed-dim 128 --lora-r 32 --lora-r-shape 0 \
	--share-all-embeddings \
	--required-batch-size-multiple 8 \
	--optimizer adam \
	--adam-betas '(0.9,0.98)' \
	--adam-eps 1e-6 \
	--clip-norm 1.0 \
	--lr-scheduler polynomial_decay \
	--lr 0.00015 \
	--warmup-updates 8000 \
	--total-num-update 100000 \
	--max-update 100000 --max-epoch 1000 \
	--max-tokens 20000 \
	--update-freq 1 \
	--log-format simple \
	--log-interval 1000 \
	--save-interval-updates 5000 \
	--fp16 \
	--fp16-init-scale 4 \
	--fp16-scale-window 256 \
	--min-loss-scale 0.0001 \
	--seed 1
	--save-dir /path/to/save/checkpoints
	--ddp-backend legacy_ddp
	```
	**Note:
	- Please adjust the hyperparameters like `lr` and `warmup-updates` based on the datasets and tasks.
	- Please adjust the `max-tokens` and `update-freq` to suit in different experimental environments.
	- Use `--fp16` for more efficient training on the devices that have Tensor Cores.

	5. Evaluation:

	```bash
	fairseq-generate $data_bin \
	--path $save_dir/checkpoint_best.pt \
	--batch-size 64 --beam 5 --remove-bpe=sentencepiece
	```

	---

	## Citation

	If you find this repository useful, please consider citing our work:
	```
	@article{ge2022edgeformer,
	title={EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation},
	author={Ge, Tao and Wei, Furu},
	journal={arXiv preprint arXiv:2202.07959},
	year={2022}
	}
	```

	## Acknowledgement

	This repository is built using the [Fairseq](https://github.com/pytorch/fairseq) repository.

	## License
	This project is licensed under the license found in the LICENSE file in the root directory of this source tree.

	[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)

	### Contact Information

	For help or issues using EdgeFormer models, please submit a GitHub issue.

	For other communications related to EdgeFormer, please contact [Tao Ge](https://www.microsoft.com/en-us/research/people/tage/) (`[email protected]`), [Furu Wei](http://gitnlp.org/) (`[email protected]`).