pegasus_indonesian_base-pretrain
Github : PEGASUS TPU Trainer
This model is a pretrained version of pegasus_indonesian_base-finetune on kaggle id news 2017, CC_News_id, and OSCAR_2201.
It achieves the following results on the evaluation set:
- Train Loss: 2.34832262992858
- Train Accuracy: 0.262173235416412
- Validation Loss: 2.34894156455993
- Validation Accuracy: 0.266122311353683
- Train Lr: 0.000136618677061051
- Epoch: 40
Intended uses & limitations
This model is uncased, can't read special characters except "," and ".", having hard time understanding numbers, and performance only tested on news article text.
Training and evaluation data
Pretrain dataset:
Training procedure
For replication, go to GitHub page
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'Adafactor', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': False, 'is_legacy_optimizer': False, 'learning_rate': 0.005, 'beta_2_decay': -0.8, 'epsilon_1': 1e-30, 'epsilon_2': 0.001, 'clip_threshold': 1.0, 'relative_step': True}
- training_precision: float32
Usage
# Load model hyperparameters
from transformers import PegasusConfig,TFPegasusForConditionalGeneration,PegasusTokenizerFast
configuration = PegasusConfig()
configuration.vocab_size = 32103
configuration.d_model = 512
configuration.dropout = 0.15
configuration.decoder_attention_heads = 8
configuration.decoder_layers = 12
configuration.decoder_ffn_dim = 3072
configuration.encoder_attention_heads = 8
configuration.encoder_layers = 12
configuration.encoder_ffn_dim = 3072
# Load model and tokenizer
# Download the weights and manually load weights using Tensorflow
model = TFPegasusForConditionalGeneration(configuration)
model.load_weights("checkpoints-pegasus_indonesian_base-pretrain-weights")
tokenizer = PegasusTokenizerFast.from_pretrained("thonyyy/pegasus_indonesian_base-finetune")
Training results
Train Loss | Train Accuracy | Validation Loss | Validation Accuracy | Train Lr | Epoch |
---|---|---|---|---|---|
4.1939034461975 | 0.145276814699172 | 3.39564657211303 | 0.186678826808929 | 0.00499999988824129 | 1 |
3.13256049156188 | 0.208270609378814 | 2.82256889343261 | 0.233325317502021 | 0.00499999988824129 | 2 |
2.84938621520996 | 0.229006066918373 | 2.72168040275573 | 0.23955675959587 | 0.00499999988824129 | 3 |
2.76001143455505 | 0.234559893608093 | 2.65143990516662 | 0.243813350796699 | 0.00499999988824129 | 4 |
2.70404982566833 | 0.238061532378196 | 2.6107530593872 | 0.246574580669403 | 0.00452418718487024 | 5 |
2.6638650894165 | 0.240613579750061 | 2.57847166061401 | 0.248678594827651 | 0.00409365398809313 | 6 |
2.63293719291687 | 0.242613524198532 | 2.55772447586059 | 0.250325441360473 | 0.00370409130118787 | 7 |
2.60750746726989 | 0.244251564145088 | 2.53469848632812 | 0.251805543899536 | 0.00335160037502646 | 8 |
2.58670353889465 | 0.245637223124504 | 2.51883554458618 | 0.253003656864166 | 0.00303265335969626 | 9 |
2.56865572929382 | 0.24682830274105 | 2.49989652633666 | 0.254459708929061 | 0.00274405837990343 | 10 |
2.55285787582397 | 0.247884958982467 | 2.50092124938964 | 0.254229605197906 | 0.00248292670585215 | 11 |
2.53919672966003 | 0.248811900615692 | 2.47859454154968 | 0.255691051483154 | 0.00224664504639804 | 12 |
2.52694725990295 | 0.249630719423294 | 2.46921157836914 | 0.25649145245552 | 0.00203284854069352 | 13 |
2.51587128639221 | 0.250377029180526 | 2.46414017677307 | 0.257025629281997 | 0.0018393974751234 | 14 |
2.50599193572998 | 0.251064419746398 | 2.4557819366455 | 0.257613778114318 | 0.00166435563005507 | 15 |
2.49690246582031 | 0.251682370901107 | 2.44843244552612 | 0.258032590150833 | 0.00150597130414098 | 16 |
2.48859119415283 | 0.252267301082611 | 2.43858122825622 | 0.258764535188674 | 0.00136265915352851 | 17 |
2.48097324371337 | 0.252792716026306 | 2.43251323699951 | 0.259270757436752 | 0.00123298505786806 | 18 |
2.47009921073913 | 0.253554105758667 | 2.43577146530151 | 0.258938610553741 | 0.00111565098632127 | 19 |
2.45849394798278 | 0.254375785589218 | 2.42337107658386 | 0.260090589523315 | 0.00100948277395218 | 20 |
2.44776940345764 | 0.255127549171447 | 2.41147446632385 | 0.260682851076126 | 0.000913417781703174 | 21 |
2.43759155273437 | 0.255834341049194 | 2.41405510902404 | 0.260819226503372 | 0.000826494593638926 | 22 |
2.42819571495056 | 0.256486028432846 | 2.40314364433288 | 0.26152354478836 | 0.000747843238059431 | 23 |
2.41974592208862 | 0.257094115018844 | 2.39181518554687 | 0.262460082769393 | 0.000676676572766155 | 24 |
2.41181802749633 | 0.257666647434234 | 2.3825569152832 | 0.263035386800766 | 0.000612282310612499 | 25 |
2.4044873714447 | 0.258173674345016 | 2.37829279899597 | 0.263585090637207 | 0.000554015976376831 | 26 |
2.39774870872497 | 0.258645176887512 | 2.37718510627746 | 0.263547003269195 | 0.000501294387504458 | 27 |
2.39184403419494 | 0.259076595306396 | 2.37379837036132 | 0.264020860195159 | 0.00045358992065303 | 28 |
2.38593125343322 | 0.259495466947555 | 2.37083029747009 | 0.264293819665908 | 0.000410425127483904 | 29 |
2.38093471527099 | 0.259853214025497 | 2.36486291885375 | 0.264451295137405 | 0.000371368019841611 | 30 |
2.37621307373046 | 0.260185241699218 | 2.36547923088073 | 0.264706671237945 | 0.000336027675075456 | 31 |
2.37177920341491 | 0.260504961013793 | 2.3609721660614 | 0.264981210231781 | 0.000304050423437729 | 32 |
2.3679461479187 | 0.260774314403533 | 2.36445379257202 | 0.264800041913986 | 0.000275116210104897 | 33 |
2.3643410205841 | 0.261037856340408 | 2.3573100566864 | 0.265379041433334 | 0.000248935451963916 | 34 |
2.36092805862426 | 0.261268675327301 | 2.36105728149414 | 0.264868646860122 | 0.000225246112677268 | 35 |
2.35798692703247 | 0.261485010385513 | 2.35409832000732 | 0.265503793954849 | 0.000203811112442053 | 36 |
2.35523629188537 | 0.26168617606163 | 2.35252356529235 | 0.265713244676589 | 0.000184415926923975 | 37 |
2.35284709930419 | 0.261859744787216 | 2.35101222991943 | 0.265856444835662 | 0.000166866433573886 | 38 |
2.35047316551208 | 0.262033462524414 | 2.34698224067687 | 0.266099989414215 | 0.000150986990774981 | 39 |
2.34832262992858 | 0.262173235416412 | 2.34894156455993 | 0.266122311353683 | 0.000136618677061051 | 40 |
Framework versions
- Transformers 4.30.2
- TensorFlow 2.12.0
- Datasets 2.13.1
- Tokenizers 0.13.3
Special Thanks
Research supported with Cloud TPUs from Google’s TPU Research Cloud (TRC)
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.