---
library_name: transformers
tags:
- generated_from_trainer
model-index:
- name: progen2_cross_attention_only_h
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# progen2_cross_attention_only_h

This model is a fine-tuned version of [](https://huggingface.co/) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 2.4917
- Perplexity: 12.0823

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 5000

### Training results

| Training Loss | Epoch   | Step | Validation Loss | Perplexity |
|:-------------:|:-------:|:----:|:---------------:|:----------:|
| 32.5486       | 0.2909  | 100  | 7.6093          | 2016.7766  |
| 22.5738       | 0.5818  | 200  | 2.8788          | 17.7926    |
| 11.4858       | 0.8727  | 300  | 2.8572          | 17.4123    |
| 11.391        | 1.1658  | 400  | 2.8481          | 17.2545    |
| 11.6307       | 1.4567  | 500  | 2.6227          | 13.7734    |
| 10.2311       | 1.7476  | 600  | 2.4862          | 12.0155    |
| 9.9477        | 2.0407  | 700  | 2.4658          | 11.7733    |
| 9.8694        | 2.3316  | 800  | 2.6730          | 14.4827    |
| 9.8291        | 2.6225  | 900  | 2.4811          | 11.9541    |
| 31.1466       | 2.9135  | 1000 | 8.7851          | 6536.0332  |
| 34.9023       | 3.2065  | 1100 | 7.7230          | 2259.8149  |
| 30.5868       | 3.4975  | 1200 | 7.5959          | 1990.0344  |
| 30.4004       | 3.7884  | 1300 | 7.5865          | 1971.3219  |
| 31.7038       | 4.0815  | 1400 | 8.0208          | 3043.6248  |
| 31.3893       | 4.3724  | 1500 | 7.2647          | 1428.9806  |
| 25.8028       | 4.6633  | 1600 | 5.7546          | 315.6425   |
| 22.4188       | 4.9542  | 1700 | 5.3616          | 213.0554   |
| 21.249        | 5.2473  | 1800 | 5.3029          | 200.9226   |
| 20.9864       | 5.5382  | 1900 | 5.3000          | 200.3277   |
| 20.9816       | 5.8291  | 2000 | 5.1496          | 172.3635   |
| 20.6328       | 6.1222  | 2100 | 4.6971          | 109.6314   |
| 18.4146       | 6.4131  | 2200 | 4.5423          | 93.9023    |
| 17.0501       | 6.704   | 2300 | 3.8270          | 45.9244    |
| 15.666        | 6.9949  | 2400 | 3.4366          | 31.0810    |
| 15.927        | 7.288   | 2500 | 3.9706          | 53.0142    |
| 13.5433       | 7.5789  | 2600 | 2.9892          | 19.8694    |
| 12.3278       | 7.8698  | 2700 | 3.1080          | 22.3761    |
| 12.0588       | 8.1629  | 2800 | 2.7287          | 15.3123    |
| 11.1222       | 8.4538  | 2900 | 2.6745          | 14.5055    |
| 10.9132       | 8.7447  | 3000 | 2.6467          | 14.1074    |
| 10.9437       | 9.0378  | 3100 | 2.6341          | 13.9301    |
| 10.8436       | 9.3287  | 3200 | 3.8787          | 48.3626    |
| 10.6462       | 9.6196  | 3300 | 2.6104          | 13.6050    |
| 10.5014       | 9.9105  | 3400 | 2.6434          | 14.0614    |
| 10.4753       | 10.2036 | 3500 | 2.6008          | 13.4750    |
| 10.4235       | 10.4945 | 3600 | 2.5825          | 13.2301    |
| 10.2556       | 10.7855 | 3700 | 2.5495          | 12.8001    |
| 10.2415       | 11.0785 | 3800 | 2.5396          | 12.6741    |
| 10.1531       | 11.3695 | 3900 | 2.5290          | 12.5413    |
| 10.1279       | 11.6604 | 4000 | 2.5270          | 12.5158    |
| 10.0816       | 11.9513 | 4100 | 2.5152          | 12.3687    |
| 10.0384       | 12.2444 | 4200 | 2.5198          | 12.4260    |
| 10.0156       | 12.5353 | 4300 | 2.5003          | 12.1862    |
| 9.9928        | 12.8262 | 4400 | 2.4984          | 12.1632    |
| 10.0172       | 13.1193 | 4500 | 2.4940          | 12.1100    |
| 9.9678        | 13.4102 | 4600 | 2.4955          | 12.1281    |
| 9.9605        | 13.7011 | 4700 | 2.4927          | 12.0943    |
| 9.9324        | 13.992  | 4800 | 2.4920          | 12.0851    |
| 9.9536        | 14.2851 | 4900 | 2.4916          | 12.0804    |
| 9.9154        | 14.576  | 5000 | 2.4917          | 12.0823    |


### Framework versions

- Transformers 4.47.1
- Pytorch 2.1.0.post301
- Datasets 3.0.2
- Tokenizers 0.21.0