babylm-subwords-2-gpt2_lm-model
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.6434
- Model Preparation Time: 0.0024
- Perplexity: 38.2216
- Bpc: 5.2563
- Babyslm Test Lexical: 0.7310
- Babyslm Test Syntactic: 0.7454
- Blimp Supplement: 0.5693
- Blimp Filtered: 0.7115
- Blimp Supplement Hypernym: 0.5012
- Blimp Supplement Qa Congruence Easy: 0.6406
- Blimp Supplement Qa Congruence Tricky: 0.2667
- Blimp Supplement Subject Aux Inversion: 0.7631
- Blimp Supplement Turn Taking: 0.675
- Blimp Adjunct Island Filtered: 0.7134
- Blimp Anaphor Gender Agreement Filtered: 0.9248
- Blimp Anaphor Number Agreement Filtered: 0.9506
- Blimp Animate Subject Passive Filtered: 0.7419
- Blimp Animate Subject Trans Filtered: 0.8678
- Blimp Causative Filtered: 0.6932
- Blimp Complex Np Island Filtered: 0.4444
- Blimp Coordinate Structure Constraint Complex Left Branch Filtered: 0.4205
- Blimp Coordinate Structure Constraint Object Extraction Filtered: 0.6133
- Blimp Determiner Noun Agreement 1 Filtered: 0.9763
- Blimp Determiner Noun Agreement 2 Filtered: 0.9409
- Blimp Determiner Noun Agreement Irregular 1 Filtered: 0.8972
- Blimp Determiner Noun Agreement Irregular 2 Filtered: 0.8890
- Blimp Determiner Noun Agreement With Adj 2 Filtered: 0.9001
- Blimp Determiner Noun Agreement With Adj Irregular 1 Filtered: 0.8524
- Blimp Determiner Noun Agreement With Adj Irregular 2 Filtered: 0.8298
- Blimp Determiner Noun Agreement With Adjective 1 Filtered: 0.9539
- Blimp Distractor Agreement Relational Noun Filtered: 0.6294
- Blimp Distractor Agreement Relative Clause Filtered: 0.5465
- Blimp Drop Argument Filtered: 0.7043
- Blimp Ellipsis N Bar 1 Filtered: 0.6185
- Blimp Ellipsis N Bar 2 Filtered: 0.8261
- Blimp Existential There Object Raising Filtered: 0.6798
- Blimp Existential There Quantifiers 1 Filtered: 0.9548
- Blimp Existential There Quantifiers 2 Filtered: 0.4061
- Blimp Existential There Subject Raising Filtered: 0.8333
- Blimp Expletive It Object Raising Filtered: 0.7628
- Blimp Inchoative Filtered: 0.5860
- Blimp Intransitive Filtered: 0.7339
- Blimp Irregular Past Participle Adjectives Filtered: 0.6681
- Blimp Irregular Past Participle Verbs Filtered: 0.7495
- Blimp Irregular Plural Subject Verb Agreement 1 Filtered: 0.8433
- Blimp Irregular Plural Subject Verb Agreement 2 Filtered: 0.8430
- Blimp Left Branch Island Echo Question Filtered: 0.5977
- Blimp Left Branch Island Simple Question Filtered: 0.5184
- Blimp Matrix Question Npi Licensor Present Filtered: 0.2831
- Blimp Npi Present 1 Filtered: 0.4345
- Blimp Npi Present 2 Filtered: 0.5317
- Blimp Only Npi Licensor Present Filtered: 0.7812
- Blimp Only Npi Scope Filtered: 0.6452
- Blimp Passive 1 Filtered: 0.8690
- Blimp Passive 2 Filtered: 0.8339
- Blimp Principle A C Command Filtered: 0.6279
- Blimp Principle A Case 1 Filtered: 0.9978
- Blimp Principle A Case 2 Filtered: 0.8678
- Blimp Principle A Domain 1 Filtered: 0.9650
- Blimp Principle A Domain 2 Filtered: 0.6066
- Blimp Principle A Domain 3 Filtered: 0.5643
- Blimp Principle A Reconstruction Filtered: 0.3630
- Blimp Regular Plural Subject Verb Agreement 1 Filtered: 0.8337
- Blimp Regular Plural Subject Verb Agreement 2 Filtered: 0.8508
- Blimp Sentential Negation Npi Licensor Present Filtered: 0.9456
- Blimp Sentential Negation Npi Scope Filtered: 0.4455
- Blimp Sentential Subject Island Filtered: 0.4485
- Blimp Superlative Quantifiers 1 Filtered: 0.6313
- Blimp Superlative Quantifiers 2 Filtered: 0.7373
- Blimp Tough Vs Raising 1 Filtered: 0.3956
- Blimp Tough Vs Raising 2 Filtered: 0.7478
- Blimp Transitive Filtered: 0.7039
- Blimp Wh Island Filtered: 0.7042
- Blimp Wh Questions Object Gap Filtered: 0.7055
- Blimp Wh Questions Subject Gap Filtered: 0.8842
- Blimp Wh Questions Subject Gap Long Distance Filtered: 0.9090
- Blimp Wh Vs That No Gap Filtered: 0.9477
- Blimp Wh Vs That No Gap Long Distance Filtered: 0.9566
- Blimp Wh Vs That With Gap Filtered: 0.2622
- Blimp Wh Vs That With Gap Long Distance Filtered: 0.0758
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 90000
- training_steps: 400000
Training results
Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Perplexity | Bpc | Babyslm Test Lexical | Babyslm Test Syntactic | Blimp Supplement | Blimp Filtered | Blimp Supplement Hypernym | Blimp Supplement Qa Congruence Easy | Blimp Supplement Qa Congruence Tricky | Blimp Supplement Subject Aux Inversion | Blimp Supplement Turn Taking | Blimp Adjunct Island Filtered | Blimp Anaphor Gender Agreement Filtered | Blimp Anaphor Number Agreement Filtered | Blimp Animate Subject Passive Filtered | Blimp Animate Subject Trans Filtered | Blimp Causative Filtered | Blimp Complex Np Island Filtered | Blimp Coordinate Structure Constraint Complex Left Branch Filtered | Blimp Coordinate Structure Constraint Object Extraction Filtered | Blimp Determiner Noun Agreement 1 Filtered | Blimp Determiner Noun Agreement 2 Filtered | Blimp Determiner Noun Agreement Irregular 1 Filtered | Blimp Determiner Noun Agreement Irregular 2 Filtered | Blimp Determiner Noun Agreement With Adj 2 Filtered | Blimp Determiner Noun Agreement With Adj Irregular 1 Filtered | Blimp Determiner Noun Agreement With Adj Irregular 2 Filtered | Blimp Determiner Noun Agreement With Adjective 1 Filtered | Blimp Distractor Agreement Relational Noun Filtered | Blimp Distractor Agreement Relative Clause Filtered | Blimp Drop Argument Filtered | Blimp Ellipsis N Bar 1 Filtered | Blimp Ellipsis N Bar 2 Filtered | Blimp Existential There Object Raising Filtered | Blimp Existential There Quantifiers 1 Filtered | Blimp Existential There Quantifiers 2 Filtered | Blimp Existential There Subject Raising Filtered | Blimp Expletive It Object Raising Filtered | Blimp Inchoative Filtered | Blimp Intransitive Filtered | Blimp Irregular Past Participle Adjectives Filtered | Blimp Irregular Past Participle Verbs Filtered | Blimp Irregular Plural Subject Verb Agreement 1 Filtered | Blimp Irregular Plural Subject Verb Agreement 2 Filtered | Blimp Left Branch Island Echo Question Filtered | Blimp Left Branch Island Simple Question Filtered | Blimp Matrix Question Npi Licensor Present Filtered | Blimp Npi Present 1 Filtered | Blimp Npi Present 2 Filtered | Blimp Only Npi Licensor Present Filtered | Blimp Only Npi Scope Filtered | Blimp Passive 1 Filtered | Blimp Passive 2 Filtered | Blimp Principle A C Command Filtered | Blimp Principle A Case 1 Filtered | Blimp Principle A Case 2 Filtered | Blimp Principle A Domain 1 Filtered | Blimp Principle A Domain 2 Filtered | Blimp Principle A Domain 3 Filtered | Blimp Principle A Reconstruction Filtered | Blimp Regular Plural Subject Verb Agreement 1 Filtered | Blimp Regular Plural Subject Verb Agreement 2 Filtered | Blimp Sentential Negation Npi Licensor Present Filtered | Blimp Sentential Negation Npi Scope Filtered | Blimp Sentential Subject Island Filtered | Blimp Superlative Quantifiers 1 Filtered | Blimp Superlative Quantifiers 2 Filtered | Blimp Tough Vs Raising 1 Filtered | Blimp Tough Vs Raising 2 Filtered | Blimp Transitive Filtered | Blimp Wh Island Filtered | Blimp Wh Questions Object Gap Filtered | Blimp Wh Questions Subject Gap Filtered | Blimp Wh Questions Subject Gap Long Distance Filtered | Blimp Wh Vs That No Gap Filtered | Blimp Wh Vs That No Gap Long Distance Filtered | Blimp Wh Vs That With Gap Filtered | Blimp Wh Vs That With Gap Long Distance Filtered |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3.5361 | 1.4408 | 50000 | 4.0304 | 0.0024 | 56.2839 | 5.8147 | 0.7055 | 0.7074 | 0.5760 | 0.6454 | 0.5321 | 0.5781 | 0.3697 | 0.7856 | 0.6143 | 0.6541 | 0.8558 | 0.8926 | 0.6916 | 0.8537 | 0.5550 | 0.3948 | 0.3190 | 0.4099 | 0.9408 | 0.8947 | 0.7474 | 0.8659 | 0.8215 | 0.7827 | 0.7774 | 0.8778 | 0.3528 | 0.3651 | 0.6978 | 0.5586 | 0.7911 | 0.7635 | 0.9258 | 0.2777 | 0.8171 | 0.7273 | 0.4982 | 0.6371 | 0.5692 | 0.7484 | 0.7488 | 0.8240 | 0.6082 | 0.4006 | 0.1755 | 0.4444 | 0.4650 | 0.4875 | 0.4325 | 0.8012 | 0.8217 | 0.5507 | 0.9934 | 0.6481 | 0.9770 | 0.5104 | 0.5271 | 0.3537 | 0.7753 | 0.7788 | 0.9445 | 0.3708 | 0.4121 | 0.6149 | 0.6440 | 0.3049 | 0.7652 | 0.6313 | 0.5958 | 0.6007 | 0.8786 | 0.9475 | 0.9268 | 0.9394 | 0.1806 | 0.0934 |
3.3391 | 2.8816 | 100000 | 3.8577 | 0.0024 | 47.3543 | 5.5654 | 0.7203 | 0.7493 | 0.5590 | 0.6776 | 0.5 | 0.5469 | 0.2970 | 0.7942 | 0.6571 | 0.6379 | 0.8908 | 0.9506 | 0.6927 | 0.8657 | 0.6785 | 0.4078 | 0.3201 | 0.5079 | 0.9731 | 0.8840 | 0.7988 | 0.8780 | 0.8640 | 0.8092 | 0.8107 | 0.9293 | 0.5406 | 0.4237 | 0.6913 | 0.5823 | 0.8225 | 0.7475 | 0.9667 | 0.3633 | 0.8268 | 0.7444 | 0.5357 | 0.6751 | 0.6847 | 0.7452 | 0.7998 | 0.8206 | 0.5016 | 0.3775 | 0.2078 | 0.4433 | 0.5460 | 0.6701 | 0.6368 | 0.7869 | 0.7918 | 0.6089 | 1.0 | 0.8262 | 0.9912 | 0.5519 | 0.5228 | 0.3557 | 0.8225 | 0.8349 | 0.9282 | 0.3835 | 0.4121 | 0.6374 | 0.7444 | 0.2985 | 0.7870 | 0.6509 | 0.5740 | 0.6217 | 0.8708 | 0.9323 | 0.8885 | 0.9337 | 0.2742 | 0.1165 |
3.1745 | 4.3224 | 150000 | 3.7534 | 0.0024 | 42.6648 | 5.4150 | 0.7217 | 0.7457 | 0.5746 | 0.6936 | 0.5238 | 0.6094 | 0.2545 | 0.8032 | 0.6821 | 0.6972 | 0.9217 | 0.9484 | 0.7050 | 0.8852 | 0.6443 | 0.4113 | 0.3355 | 0.5258 | 0.9742 | 0.9151 | 0.8267 | 0.8890 | 0.8640 | 0.8301 | 0.8357 | 0.9453 | 0.5838 | 0.4363 | 0.7217 | 0.6272 | 0.8406 | 0.7217 | 0.9613 | 0.3568 | 0.8550 | 0.7523 | 0.5544 | 0.7016 | 0.6868 | 0.7410 | 0.8047 | 0.8363 | 0.6030 | 0.3901 | 0.2551 | 0.3839 | 0.5022 | 0.7846 | 0.6308 | 0.8440 | 0.8217 | 0.5655 | 1.0 | 0.8481 | 0.9661 | 0.5814 | 0.5675 | 0.3619 | 0.8360 | 0.8392 | 0.9489 | 0.3869 | 0.4183 | 0.6558 | 0.7748 | 0.3671 | 0.7348 | 0.6993 | 0.6990 | 0.6566 | 0.8920 | 0.9312 | 0.9199 | 0.9531 | 0.2263 | 0.0890 |
3.1032 | 5.7632 | 200000 | 3.7040 | 0.0024 | 40.6095 | 5.3437 | 0.7273 | 0.7403 | 0.5692 | 0.7006 | 0.5012 | 0.6094 | 0.2727 | 0.7665 | 0.6964 | 0.6853 | 0.9022 | 0.9646 | 0.7117 | 0.8754 | 0.6565 | 0.4161 | 0.3951 | 0.5469 | 0.9774 | 0.9248 | 0.8590 | 0.8841 | 0.8735 | 0.8454 | 0.8262 | 0.9432 | 0.5685 | 0.4879 | 0.7283 | 0.5985 | 0.8273 | 0.7463 | 0.9699 | 0.4720 | 0.8539 | 0.7602 | 0.5497 | 0.7293 | 0.6504 | 0.7431 | 0.8234 | 0.8576 | 0.5681 | 0.4837 | 0.2562 | 0.4279 | 0.5197 | 0.8073 | 0.6523 | 0.8548 | 0.8029 | 0.6173 | 0.9989 | 0.8175 | 0.9726 | 0.6120 | 0.5484 | 0.3330 | 0.8382 | 0.8561 | 0.9565 | 0.4592 | 0.4350 | 0.5781 | 0.7018 | 0.3397 | 0.7598 | 0.6901 | 0.6823 | 0.6740 | 0.8808 | 0.9312 | 0.9292 | 0.9554 | 0.2688 | 0.0802 |
3.0031 | 7.2040 | 250000 | 3.6789 | 0.0024 | 39.6021 | 5.3075 | 0.7266 | 0.7403 | 0.5695 | 0.7019 | 0.4964 | 0.5938 | 0.2848 | 0.7761 | 0.6964 | 0.6950 | 0.9073 | 0.9463 | 0.7017 | 0.8635 | 0.6748 | 0.4350 | 0.3620 | 0.5975 | 0.9742 | 0.9356 | 0.8532 | 0.8866 | 0.9022 | 0.8287 | 0.8226 | 0.9400 | 0.5977 | 0.5189 | 0.7109 | 0.6309 | 0.8297 | 0.7167 | 0.9699 | 0.3765 | 0.8463 | 0.7747 | 0.5754 | 0.7212 | 0.6847 | 0.7728 | 0.8246 | 0.8262 | 0.6051 | 0.4606 | 0.2906 | 0.4521 | 0.5088 | 0.7971 | 0.6487 | 0.8607 | 0.8217 | 0.6300 | 0.9978 | 0.8557 | 0.9595 | 0.6022 | 0.5473 | 0.3588 | 0.8427 | 0.8423 | 0.9434 | 0.3904 | 0.4329 | 0.5434 | 0.7241 | 0.3629 | 0.7620 | 0.7085 | 0.6615 | 0.6694 | 0.8686 | 0.9078 | 0.9280 | 0.952 | 0.2971 | 0.0912 |
2.9706 | 8.6448 | 300000 | 3.6588 | 0.0024 | 38.8164 | 5.2786 | 0.7279 | 0.7436 | 0.5746 | 0.7062 | 0.4952 | 0.625 | 0.2848 | 0.7714 | 0.6964 | 0.7123 | 0.9094 | 0.9560 | 0.7240 | 0.8722 | 0.6956 | 0.4350 | 0.3598 | 0.6101 | 0.9806 | 0.9345 | 0.8855 | 0.8963 | 0.9001 | 0.8426 | 0.8119 | 0.9411 | 0.6231 | 0.5465 | 0.7141 | 0.6022 | 0.8261 | 0.7377 | 0.9484 | 0.4127 | 0.8377 | 0.7642 | 0.5836 | 0.7281 | 0.7076 | 0.7335 | 0.8358 | 0.8330 | 0.5776 | 0.4858 | 0.2766 | 0.4422 | 0.5284 | 0.7778 | 0.6069 | 0.8738 | 0.8306 | 0.6110 | 0.9978 | 0.8372 | 0.9694 | 0.6142 | 0.5696 | 0.3226 | 0.8438 | 0.8423 | 0.9445 | 0.4397 | 0.3996 | 0.6404 | 0.7525 | 0.3449 | 0.7793 | 0.7166 | 0.6906 | 0.6799 | 0.8875 | 0.9043 | 0.9361 | 0.9543 | 0.2524 | 0.0835 |
2.917 | 10.0856 | 350000 | 3.6532 | 0.0024 | 38.5978 | 5.2704 | 0.7315 | 0.7447 | 0.5608 | 0.7098 | 0.4893 | 0.5938 | 0.2606 | 0.7711 | 0.6893 | 0.7231 | 0.9176 | 0.9484 | 0.7464 | 0.8754 | 0.6980 | 0.4433 | 0.4194 | 0.5996 | 0.9763 | 0.9323 | 0.8840 | 0.8951 | 0.8916 | 0.8538 | 0.8321 | 0.9550 | 0.6383 | 0.5316 | 0.7011 | 0.6060 | 0.8297 | 0.6860 | 0.9581 | 0.4050 | 0.8431 | 0.7549 | 0.5743 | 0.7154 | 0.6816 | 0.7580 | 0.8495 | 0.8543 | 0.6061 | 0.5342 | 0.2885 | 0.4004 | 0.5098 | 0.7766 | 0.6607 | 0.8643 | 0.8372 | 0.6142 | 1.0 | 0.8590 | 0.9716 | 0.5945 | 0.5547 | 0.3568 | 0.8449 | 0.8593 | 0.9336 | 0.4259 | 0.4422 | 0.6170 | 0.7181 | 0.3935 | 0.7522 | 0.7028 | 0.6865 | 0.7183 | 0.8998 | 0.9242 | 0.9384 | 0.96 | 0.2655 | 0.0681 |
2.8668 | 11.5264 | 400000 | 3.6434 | 0.0024 | 38.2216 | 5.2563 | 0.7310 | 0.7454 | 0.5693 | 0.7115 | 0.5012 | 0.6406 | 0.2667 | 0.7631 | 0.675 | 0.7134 | 0.9248 | 0.9506 | 0.7419 | 0.8678 | 0.6932 | 0.4444 | 0.4205 | 0.6133 | 0.9763 | 0.9409 | 0.8972 | 0.8890 | 0.9001 | 0.8524 | 0.8298 | 0.9539 | 0.6294 | 0.5465 | 0.7043 | 0.6185 | 0.8261 | 0.6798 | 0.9548 | 0.4061 | 0.8333 | 0.7628 | 0.5860 | 0.7339 | 0.6681 | 0.7495 | 0.8433 | 0.8430 | 0.5977 | 0.5184 | 0.2831 | 0.4345 | 0.5317 | 0.7812 | 0.6452 | 0.8690 | 0.8339 | 0.6279 | 0.9978 | 0.8678 | 0.9650 | 0.6066 | 0.5643 | 0.3630 | 0.8337 | 0.8508 | 0.9456 | 0.4455 | 0.4485 | 0.6313 | 0.7373 | 0.3956 | 0.7478 | 0.7039 | 0.7042 | 0.7055 | 0.8842 | 0.9090 | 0.9477 | 0.9566 | 0.2622 | 0.0758 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.0+cu118
- Datasets 2.18.0
- Tokenizers 0.19.1
- Downloads last month
- 61
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support