# s2s-ft: Sequence-to-Sequence Fine-Tuning **A PyTorch package used to fine-tune pre-trained Transformers for sequence-to-sequence language generation** ## Environment The recommended way to run the code is using docker: ```bash docker run -it --rm --runtime=nvidia --ipc=host --privileged pytorch/pytorch:1.2-cuda10.0-cudnn7-devel bash ``` The following Python package need to be installed: ```bash pip install --user methodtools py-rouge pyrouge nltk python -c "import nltk; nltk.download('punkt')" git clone https://github.com/NVIDIA/apex.git && cd apex && git reset --hard de6378f5dae8fcf2879a4be8ecea8bbcb9e59d5 && python setup.py install --cuda_ext --cpp_ext ``` Install the repo as a package: ```bash git clone this repo into ${code_dir} cd ${code_dir} ; pip install --editable . ``` ## Pre-trained Models We recommend to use the uncased model: - [unilm1.2-base-uncased](https://conversationhub.blob.core.windows.net/beit-share-public/ckpt/unilm1.2-base-uncased.bin?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D): 12-layer, 768-hidden, 12-heads, 110M parameters - [unilm2-base-uncased](https://conversationhub.blob.core.windows.net/beit-share-public/ckpt/unilm2-base-uncased.bin?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D): 12-layer, 768-hidden, 12-heads, 110M parameters If you would like to use a cased model: - [unilm1-base-cased](https://conversationhub.blob.core.windows.net/beit-share-public/ckpt/unilm1-base-cased.bin?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D): 12-layer, 768-hidden, 12-heads, 110M parameters - [unilm1-large-cased](https://conversationhub.blob.core.windows.net/beit-share-public/ckpt/unilm1-large-cased.bin?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D): 24-layer, 1024-hidden, 16-heads, 340M parameters - [unilm2-large-uncased](https://conversationhub.blob.core.windows.net/beit-share-public/ckpt/unilm2-large-uncased.bin?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D): 24-layer, 1024-hidden, 16-heads, 340M parameters - [unilm2-large-cased](https://conversationhub.blob.core.windows.net/beit-share-public/ckpt/unilm2-large-cased.bin?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D): 24-layer, 1024-hidden, 16-heads, 340M parameters If you prefer [small pretrained models](https://github.com/microsoft/unilm/tree/master/minilm) for faster inference speed: - [minilm-l12-h384-uncased](https://1drv.ms/u/s!AjHn0yEmKG8qixAYyu2Fvq5ulnU7?e=DFApTA): 12-layer, 384-hidden, 12-heads, 33M parameters ## Input File Format We support two dataset formats: 1. Text format: each line contains a json string of an example. `"src"` contains source sequence text, `"tgt"` contains target sequence text (`"tgt"` can be ignored for decoding). The data should be pre-processed as follows: ```bash {"src": "Messages posted on social media claimed the user planned to `` kill as many people as possible ''", "tgt": "Threats to kill pupils in a shooting at a Blackpool school are being investigated by Lancashire police ."} {"src": "Media playback is unsupported on your device", "tgt": "A slide running the entire length of one of the steepest city centre streets in Europe has been turned into a massive three-lane water adventure ."} {"src": "Chris Erskine crossed low for Kris Doolan to tap home and give the Jags an early lead .", "tgt": "Partick Thistle will finish in the Scottish Premiership 's top six for the first time after beating Motherwell"} ``` 2. Tokenized format: if you use tokenized data (with the same WordPiece tokenizers as BERT), `"src"` is a list of source sequence tokens, and `"tgt"` is a list of target sequence tokens (`"tgt"` can be ignored for decoding): ```bash {"src": ["messages", "posted", "on", "social", "media", "claimed", "the", "user", "planned", "to", "\"", "kill", "as", "many", "people", "as", "possible", "\""], "tgt": ["threats", "to", "kill", "pupils", "in", "a", "shooting", "at", "a", "blackpool", "school", "are", "being", "investigated", "by", "lancashire", "police", "."]} {"src": ["media", "playback", "is", "un", "##su", "##pp", "##orted", "on", "your", "device"], "tgt": ["a", "slide", "running", "the", "entire", "length", "of", "one", "of", "the", "steep", "##est", "city", "centre", "streets", "in", "europe", "has", "been", "turned", "into", "a", "massive", "three", "-", "lane", "water", "adventure", "."]} {"src": ["chris", "erskine", "crossed", "low", "for", "kris", "doo", "##lan", "to", "tap", "home", "and", "give", "the", "ja", "##gs", "an", "early", "lead", "."], "tgt": ["part", "##ick", "thistle", "will", "finish", "in", "the", "scottish", "premiership", "'", "s", "top", "six", "for", "the", "first", "time", "after", "beating", "mother", "##well"]} ``` The code automatically detects the input format. If the json line contains `list`, we process the input as the tokenized format; if the json line contains `string`, the code will tokenize them. ## Example: [XSum](https://github.com/EdinburghNLP/XSum) with unilm1.2-base-uncased ### Fine-tuning Pre-processed json dataset links: [text format](https://conversationhub.blob.core.windows.net/beit-share-public/s2s-ft-data/xsum.json.zip?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D), or [tokenized format](https://conversationhub.blob.core.windows.net/beit-share-public/s2s-ft-data/xsum.uncased_tokenized.zip?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D). ```bash # path of training data TRAIN_FILE=/your/path/to/train.json # folder used to save fine-tuned checkpoints OUTPUT_DIR=/your/path/to/save_checkpoints # folder used to cache package dependencies CACHE_DIR=/your/path/to/transformer_package_cache export CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py \ --train_file ${TRAIN_FILE} --output_dir ${OUTPUT_DIR} \ --model_type unilm --model_name_or_path unilm1.2-base-uncased \ --do_lower_case --fp16 --fp16_opt_level O2 --max_source_seq_length 464 --max_target_seq_length 48 \ --per_gpu_train_batch_size 16 --gradient_accumulation_steps 1 \ --learning_rate 7e-5 --num_warmup_steps 500 --num_training_steps 32000 --cache_dir ${CACHE_DIR} ``` - The fine-tuning batch size = `number of gpus` * `per_gpu_train_batch_size` * `gradient_accumulation_steps`. So in the above example, the batch size is `4*16*1 = 64`. The three arguments need to be adjusted together in order to remain the total batch size unchanged. - `--do_lower_case`: for uncased models ### Decoding ```bash # path of the fine-tuned checkpoint MODEL_PATH=/your/path/to/model_checkpoint SPLIT=validation # input file that you would like to decode INPUT_JSON=/your/path/to/${SPLIT}.json export CUDA_VISIBLE_DEVICES=0 export OMP_NUM_THREADS=4 export MKL_NUM_THREADS=4 python decode_seq2seq.py \ --fp16 --model_type unilm --tokenizer_name unilm1.2-base-uncased --input_file ${INPUT_JSON} --split $SPLIT --do_lower_case \ --model_path ${MODEL_PATH} --max_seq_length 512 --max_tgt_length 48 --batch_size 32 --beam_size 5 \ --length_penalty 0 --forbid_duplicate_ngrams --mode s2s --forbid_ignore_word "." ``` - The decoding results are saved at `${MODEL_PATH}.${SPLIT}`. - `--do_lower_case`: for uncased models ### Evaluation The golden answer text files can be downloaded at [here](https://conversationhub.blob.core.windows.net/beit-share-public/s2s-ft-data/xsum.eval.zip?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D). ```bash SPLIT=validation GOLD_PATH=/your/path/to/${SPLIT}.target # ${MODEL_PATH}.${SPLIT} is the predicted target file python evaluations/eval_for_xsum.py --pred ${MODEL_PATH}.${SPLIT} --gold ${GOLD_PATH} --split ${SPLIT} ``` ## Example: [XSum](https://github.com/EdinburghNLP/XSum) with minilm-l12-h384-uncased ### Fine-tuning Pre-processed json dataset links: [text format](https://conversationhub.blob.core.windows.net/beit-share-public/s2s-ft-data/xsum.json.zip?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D), or [tokenized format](https://conversationhub.blob.core.windows.net/beit-share-public/s2s-ft-data/xsum.uncased_tokenized.zip?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D). ```bash # path of training data TRAIN_FILE=/your/path/to/train.json # folder used to save fine-tuned checkpoints OUTPUT_DIR=/your/path/to/save_checkpoints # folder used to cache package dependencies CACHE_DIR=/your/path/to/transformer_package_cache export CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py \ --train_file ${TRAIN_FILE} --output_dir ${OUTPUT_DIR} \ --model_type minilm --model_name_or_path minilm-l12-h384-uncased \ --do_lower_case --fp16 --fp16_opt_level O2 --max_source_seq_length 464 --max_target_seq_length 48 \ --per_gpu_train_batch_size 16 --gradient_accumulation_steps 1 \ --learning_rate 1e-4 --num_warmup_steps 500 --num_training_steps 108000 --cache_dir ${CACHE_DIR} ``` - The fine-tuning batch size = `number of gpus` * `per_gpu_train_batch_size` * `gradient_accumulation_steps`. So in the above example, the batch size is `4*16*1 = 64`. The three arguments need to be adjusted together in order to remain the total batch size unchanged. - `--do_lower_case`: for uncased models ### Decoding ```bash # path of the fine-tuned checkpoint MODEL_PATH=/your/path/to/model_checkpoint SPLIT=validation # input file that you would like to decode INPUT_JSON=/your/path/to/${SPLIT}.json export CUDA_VISIBLE_DEVICES=0 export OMP_NUM_THREADS=4 export MKL_NUM_THREADS=4 python decode_seq2seq.py \ --fp16 --model_type minilm --tokenizer_name minilm-l12-h384-uncased --input_file ${INPUT_JSON} --split $SPLIT --do_lower_case \ --model_path ${MODEL_PATH} --max_seq_length 512 --max_tgt_length 48 --batch_size 32 --beam_size 5 \ --length_penalty 0 --forbid_duplicate_ngrams --mode s2s --forbid_ignore_word "." ``` - The decoding results are saved at `${MODEL_PATH}.${SPLIT}`. - `--do_lower_case`: for uncased models ### Evaluation The golden answer text files can be downloaded at [here](https://conversationhub.blob.core.windows.net/beit-share-public/s2s-ft-data/xsum.eval.zip?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D). ```bash SPLIT=validation GOLD_PATH=/your/path/to/${SPLIT}.target # ${MODEL_PATH}.${SPLIT} is the predicted target file python evaluations/eval_for_xsum.py --pred ${MODEL_PATH}.${SPLIT} --gold ${GOLD_PATH} --split ${SPLIT} ``` ## Example: CNN / Daily Mail with unilm1-base-cased Pre-processed json dataset links: [tokenized format](https://conversationhub.blob.core.windows.net/beit-share-public/s2s-ft-data/cnndm.cased_tokenized.zip?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D). ### Fine-tuning ```bash # path of training data export TRAIN_FILE=/your/path/to/train.json # path used to cache training data export CACHED_FEATURE_FILE=/your/path/to/cnndm_train.cased.features.pt # folder used to save fine-tuned checkpoints export OUTPUT_DIR=/your/path/to/save_checkpoints # folder used to cache package dependencies export CACHE_DIR=/your/path/to/transformer_package_cache export CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py \ --train_file $TRAIN_FILE --cached_train_features_file $CACHED_FEATURE_FILE --output_dir $OUTPUT_DIR \ --model_type unilm --model_name_or_path unilm1-base-cased --fp16 --fp16_opt_level O2 \ --max_source_seq_length 608 --max_target_seq_length 160 --per_gpu_train_batch_size 8 --gradient_accumulation_steps 2 \ --learning_rate 7e-5 --num_warmup_steps 1000 --num_training_steps 45000 --cache_dir $CACHE_DIR --save_steps 1500 ``` - The fine-tuning batch size = `number of gpus` * `per_gpu_train_batch_size` * `gradient_accumulation_steps`. So in the above example, the batch size is `4*8*2 = 64`. The three arguments need to be adjusted together in order to remain the total batch size unchanged. - A fine-tuned checkpoint is provided at [here](https://conversationhub.blob.core.windows.net/beit-share-public/ckpt/cnndm.unilm1-base-cased.bin?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D). ### Decoding ```bash # path of the fine-tuned checkpoint MODEL_PATH=/your/path/to/model_checkpoint SPLIT=dev # input file that you would like to decode INPUT_JSON=/your/path/to/${SPLIT}.json export CUDA_VISIBLE_DEVICES=0 export OMP_NUM_THREADS=4 export MKL_NUM_THREADS=4 python decode_seq2seq.py \ --fp16 --model_type unilm --tokenizer_name unilm1-base-cased --input_file ${INPUT_JSON} --split $SPLIT \ --model_path ${MODEL_PATH} --max_seq_length 768 --max_tgt_length 160 --batch_size 32 --beam_size 5 \ --length_penalty 0 --forbid_duplicate_ngrams --mode s2s --forbid_ignore_word "." ``` - The decoding results are saved at `${MODEL_PATH}.${SPLIT}`. ### Evaluation The golden answer text files can be downloaded at [here](https://conversationhub.blob.core.windows.net/beit-share-public/s2s-ft-data/cnndm.eval.zip?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D). ```bash SPLIT=dev GOLD_PATH=/your/path/to/${SPLIT}.target # ${MODEL_PATH}.${SPLIT} is the predicted target file python evaluations/eval_for_cnndm.py --pred ${MODEL_PATH}.${SPLIT} --gold ${GOLD_PATH} --split ${SPLIT} --trunc_len 160 ``` ## Example: CNN / Daily Mail with unilm1.2-base-uncased Pre-processed json dataset links: [tokenized format](https://conversationhub.blob.core.windows.net/beit-share-public/s2s-ft-data/cnndm.uncased_tokenized.zip?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D). ### Fine-tuning ```bash # path of training data export TRAIN_FILE=/your/path/to/train.json # folder used to save fine-tuned checkpoints export OUTPUT_DIR=/your/path/to/save_checkpoints # folder used to cache package dependencies export CACHE_DIR=/your/path/to/transformer_package_cache export CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py \ --train_file $TRAIN_FILE --output_dir $OUTPUT_DIR \ --model_type unilm --model_name_or_path unilm1.2-base-uncased --do_lower_case --fp16 --fp16_opt_level O2 \ --max_source_seq_length 608 --max_target_seq_length 160 --per_gpu_train_batch_size 8 --gradient_accumulation_steps 2 \ --learning_rate 7e-5 --num_warmup_steps 1000 --num_training_steps 45000 --cache_dir $CACHE_DIR --save_steps 1500 ``` - The fine-tuning batch size = `number of gpus` * `per_gpu_train_batch_size` * `gradient_accumulation_steps`. So in the above example, the batch size is `4*8*2 = 64`. The three arguments need to be adjusted together in order to remain the total batch size unchanged. - `--do_lower_case`: for uncased models ### Decoding ```bash # path of the fine-tuned checkpoint MODEL_PATH=/your/path/to/model_checkpoint SPLIT=dev # input file that you would like to decode INPUT_JSON=/your/path/to/${SPLIT}.json export CUDA_VISIBLE_DEVICES=0 export OMP_NUM_THREADS=4 export MKL_NUM_THREADS=4 python decode_seq2seq.py \ --fp16 --model_type unilm --tokenizer_name unilm1.2-base-uncased --do_lower_case --input_file ${INPUT_JSON} --split $SPLIT \ --model_path ${MODEL_PATH} --max_seq_length 768 --max_tgt_length 160 --batch_size 32 --beam_size 5 \ --length_penalty 0 --forbid_duplicate_ngrams --mode s2s --forbid_ignore_word "." --min_len 48 ``` - The decoding results are saved at `${MODEL_PATH}.${SPLIT}`. ### Evaluation The golden answer text files can be downloaded at [here](https://conversationhub.blob.core.windows.net/beit-share-public/s2s-ft-data/cnndm.eval.zip?sv=2021-10-04&st=2023-06-08T11%3A16%3A02Z&se=2033-06-09T11%3A16%3A00Z&sr=c&sp=r&sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D). ```bash SPLIT=dev GOLD_PATH=/your/path/to/${SPLIT}.target # ${MODEL_PATH}.${SPLIT} is the predicted target file python evaluations/eval_for_cnndm.py --pred ${MODEL_PATH}.${SPLIT} --gold ${GOLD_PATH} --split ${SPLIT} --trunc_len 160 ``` ## Citation If you find this repository useful, please consider citing our work: ``` @article{s2s-ft, title={s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning}, author={Hangbo Bao and Li Dong and Wenhui Wang and Nan Yang and Furu Wei}, year={2021}, eprint={2110.13640}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` ## Example: XSum with unilm2-base-uncased ### Dataset Pre-processed json dataset (tokenized format) [link](https://drive.google.com/file/d/1_Y2g-fGnAXjnFpMddp13Te-vfprX6iGH/view?usp=sharing) ### Fine-tuning ```bash # choice the model to fine-tuning export MODEL_NAME=unilm2-base-uncased # path of training data export TRAIN_FILE=/your/path/to/train.json # folder used to save fine-tuned checkpoints export OUTPUT_DIR=/your/path/to/save_checkpoints # folder used to cache package dependencies export CACHE_DIR=/your/path/to/transformer_package_cache # learning rate export LR=7e-5 # number of total training steps export NUM_STEPS=48000 # target segment word drop prob export TMP=0.4 # max length for source sequence export MAX_SRC=720 # max length for target sequence export MAX_TGT=48 export CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py \ --train_file $TRAIN_FILE --output_dir $OUTPUT_DIR \ --model_type unilm --model_name_or_path unilm2-base-uncased --do_lower_case \ --fp16 --fp16_opt_level O2 \ --max_source_seq_length $MAX_SRC --max_target_seq_length $MAX_TGT \ --per_gpu_train_batch_size 8 --gradient_accumulation_steps 2 \ --learning_rate $LR --num_warmup_steps 1000 --num_training_steps $NUM_STEPS \ --cache_dir $CACHE_DIR --save_steps 1500 --target_mask_prob $TMP ``` - The fine-tuning batch size = `number of gpus` * `per_gpu_train_batch_size` * `gradient_accumulation_steps`. So in the above example, the batch size is `4*8*2 = 64`. The three arguments need to be adjusted together in order to remain the total batch size unchanged. - `--do_lower_case`: for uncased models, no need for cased model - `--target_mask_prob`: target segment word drop prob. For the xsum dataset, we recommend a value of 0.4 or 0.5. For the CNN/DailyMail dataset, we recommend a value of 0.7 for 0.8. - `--learning_rate`: learning rate. For the base models, we recommend a value of 5e-5 ~ 1e-4. For the large models, we recommend a value of 1e-5 ~ 3e-5. ### Decoding ```bash # tokenizer to decoding, same with the model name export TOKENIZER_NAME=$MODEL_NAME # or unilm2-base-uncased # path of the fine-tuned checkpoint export MODEL_PATH=/your/path/to/model_checkpoint export SPLIT=dev # input file that you would like to decode export INPUT_JSON=/your/path/to/${SPLIT}.json # max length for source sequence export MAX_SRC=720 # max length for target sequence export MAX_TGT=48 export MAX_LEN=$(($SRC_LEN+$TGT_LEN)) # set minimum length for decoding target sequence export MIN_LEN=1 export CUDA_VISIBLE_DEVICES=0 export OMP_NUM_THREADS=4 export MKL_NUM_THREADS=4 python decode_seq2seq.py \ --fp16 --model_type unilm --tokenizer_name unilm2-base-uncased --do_lower_case \ --input_file ${INPUT_JSON} --split $SPLIT --model_path ${MODEL_PATH} \ --max_seq_length $MAX_LEN --max_tgt_length $MAX_TGT \ --batch_size 24 --beam_size 8 \ --length_penalty 0.9 --forbid_duplicate_ngrams --mode s2s \ --forbid_ignore_word "." --min_len $MIN_LEN ``` - The decoding results are saved at `${MODEL_PATH}.${SPLIT}`. - `--do_lower_case`: for uncased models, no need for cased model ### Evalation The golden answer text files can be downloaded at [here](https://drive.google.com/file/d/142ANA2bJVYpg3bFfvda5zY7pj6Brg9RH/view?usp=sharing). ```bash SPLIT=dev GOLD_PATH=/your/path/to/${SPLIT}.target # ${MODEL_PATH}.${SPLIT} is the predicted target file python evaluations/evl_for_xsum.py --pred ${MODEL_PATH}.${SPLIT} --gold ${GOLD_PATH} --split ${SPLIT} ``` ## License This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the [transformers](https://github.com/huggingface/transformers) project. [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)