Trained using Eole for 5½ hours on an NVIDIA RTX PRO 6000 Blackwell Server Edition (CC 12.0, 96 GB VRAM, 600W max power) GPU with cuda_13.0.r13.0/compiler.36424714_0, torch 2.10.0+cu128, courtesy a Lightning.ai free trial.
| Training | Validation | |||
|---|---|---|---|---|
| embeddings | 32317440 | validation BLEU | 43.41668106422167 | |
| encoder | 75524096 | Train perplexity | 4.96248 | |
| decoder | 100702208 | Train accuracy | 95.5266 | |
| generator | 31560 | Sentences processed | 5.38617e+07 | |
| other | 0 | {'corpus_1' | {'count': 53861702, 'index': 8095}} | |
| * number of parameters | 208575304 | Average bsz | 1866/2478/82 | |
| Trainable parameters | {'torch.bfloat16': 208575304} | Validation perplexity | 22.17 | |
| Non trainable parameters | {} | Validation accuracy | 71.8956 | |
| * src vocab size | 31560 | |||
| * tgt vocab size | 31560 |
- Downloads last month
- 42