pegasus_indonesian_base-pretrain

Github : PEGASUS TPU Trainer

This model is a pretrained version of pegasus_indonesian_base-finetune on kaggle id news 2017, CC_News_id, and OSCAR_2201.

It achieves the following results on the evaluation set:

Train Loss: 2.34832262992858
Train Accuracy: 0.262173235416412
Validation Loss: 2.34894156455993
Validation Accuracy: 0.266122311353683
Train Lr: 0.000136618677061051
Epoch: 40

Intended uses & limitations

This model is uncased, can't read special characters except "," and ".", having hard time understanding numbers, and performance only tested on news article text.

Training and evaluation data

Pretrain dataset:

Training procedure

For replication, go to GitHub page

Training hyperparameters

The following hyperparameters were used during training:

optimizer: {'name': 'Adafactor', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': False, 'is_legacy_optimizer': False, 'learning_rate': 0.005, 'beta_2_decay': -0.8, 'epsilon_1': 1e-30, 'epsilon_2': 0.001, 'clip_threshold': 1.0, 'relative_step': True}
training_precision: float32

Usage

# Load model hyperparameters
from transformers import PegasusConfig,TFPegasusForConditionalGeneration,PegasusTokenizerFast
configuration = PegasusConfig()
configuration.vocab_size = 32103
configuration.d_model = 512
configuration.dropout = 0.15
configuration.decoder_attention_heads = 8
configuration.decoder_layers = 12
configuration.decoder_ffn_dim = 3072
configuration.encoder_attention_heads = 8
configuration.encoder_layers = 12
configuration.encoder_ffn_dim = 3072

# Load model and tokenizer
# Download the weights and manually load weights using Tensorflow
model = TFPegasusForConditionalGeneration(configuration)
model.load_weights("checkpoints-pegasus_indonesian_base-pretrain-weights")
tokenizer = PegasusTokenizerFast.from_pretrained("thonyyy/pegasus_indonesian_base-finetune")

Training results

Train Loss	Train Accuracy	Validation Loss	Validation Accuracy	Train Lr	Epoch
4.1939034461975	0.145276814699172	3.39564657211303	0.186678826808929	0.00499999988824129	1
3.13256049156188	0.208270609378814	2.82256889343261	0.233325317502021	0.00499999988824129	2
2.84938621520996	0.229006066918373	2.72168040275573	0.23955675959587	0.00499999988824129	3
2.76001143455505	0.234559893608093	2.65143990516662	0.243813350796699	0.00499999988824129	4
2.70404982566833	0.238061532378196	2.6107530593872	0.246574580669403	0.00452418718487024	5
2.6638650894165	0.240613579750061	2.57847166061401	0.248678594827651	0.00409365398809313	6
2.63293719291687	0.242613524198532	2.55772447586059	0.250325441360473	0.00370409130118787	7
2.60750746726989	0.244251564145088	2.53469848632812	0.251805543899536	0.00335160037502646	8
2.58670353889465	0.245637223124504	2.51883554458618	0.253003656864166	0.00303265335969626	9
2.56865572929382	0.24682830274105	2.49989652633666	0.254459708929061	0.00274405837990343	10
2.55285787582397	0.247884958982467	2.50092124938964	0.254229605197906	0.00248292670585215	11
2.53919672966003	0.248811900615692	2.47859454154968	0.255691051483154	0.00224664504639804	12
2.52694725990295	0.249630719423294	2.46921157836914	0.25649145245552	0.00203284854069352	13
2.51587128639221	0.250377029180526	2.46414017677307	0.257025629281997	0.0018393974751234	14
2.50599193572998	0.251064419746398	2.4557819366455	0.257613778114318	0.00166435563005507	15
2.49690246582031	0.251682370901107	2.44843244552612	0.258032590150833	0.00150597130414098	16
2.48859119415283	0.252267301082611	2.43858122825622	0.258764535188674	0.00136265915352851	17
2.48097324371337	0.252792716026306	2.43251323699951	0.259270757436752	0.00123298505786806	18
2.47009921073913	0.253554105758667	2.43577146530151	0.258938610553741	0.00111565098632127	19
2.45849394798278	0.254375785589218	2.42337107658386	0.260090589523315	0.00100948277395218	20
2.44776940345764	0.255127549171447	2.41147446632385	0.260682851076126	0.000913417781703174	21
2.43759155273437	0.255834341049194	2.41405510902404	0.260819226503372	0.000826494593638926	22
2.42819571495056	0.256486028432846	2.40314364433288	0.26152354478836	0.000747843238059431	23
2.41974592208862	0.257094115018844	2.39181518554687	0.262460082769393	0.000676676572766155	24
2.41181802749633	0.257666647434234	2.3825569152832	0.263035386800766	0.000612282310612499	25
2.4044873714447	0.258173674345016	2.37829279899597	0.263585090637207	0.000554015976376831	26
2.39774870872497	0.258645176887512	2.37718510627746	0.263547003269195	0.000501294387504458	27
2.39184403419494	0.259076595306396	2.37379837036132	0.264020860195159	0.00045358992065303	28
2.38593125343322	0.259495466947555	2.37083029747009	0.264293819665908	0.000410425127483904	29
2.38093471527099	0.259853214025497	2.36486291885375	0.264451295137405	0.000371368019841611	30
2.37621307373046	0.260185241699218	2.36547923088073	0.264706671237945	0.000336027675075456	31
2.37177920341491	0.260504961013793	2.3609721660614	0.264981210231781	0.000304050423437729	32
2.3679461479187	0.260774314403533	2.36445379257202	0.264800041913986	0.000275116210104897	33
2.3643410205841	0.261037856340408	2.3573100566864	0.265379041433334	0.000248935451963916	34
2.36092805862426	0.261268675327301	2.36105728149414	0.264868646860122	0.000225246112677268	35
2.35798692703247	0.261485010385513	2.35409832000732	0.265503793954849	0.000203811112442053	36
2.35523629188537	0.26168617606163	2.35252356529235	0.265713244676589	0.000184415926923975	37
2.35284709930419	0.261859744787216	2.35101222991943	0.265856444835662	0.000166866433573886	38
2.35047316551208	0.262033462524414	2.34698224067687	0.266099989414215	0.000150986990774981	39
2.34832262992858	0.262173235416412	2.34894156455993	0.266122311353683	0.000136618677061051	40

Framework versions

Transformers 4.30.2
TensorFlow 2.12.0
Datasets 2.13.1
Tokenizers 0.13.3

Special Thanks

Research supported with Cloud TPUs from Google’s TPU Research Cloud (TRC)

thonyyy
/

pegasus_indonesian_base-pretrain