AdaDecode
Collection
9 items
•
Updated
This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the meng-lab/Llama-3.1-8B-Instruct-gsm8k dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Loss Layer 4 Head | Loss Layer 8 Head | Loss Layer 12 Head | Loss Layer 16 Head | Loss Layer 20 Head | Loss Layer 24 Head | Loss Layer 28 Head |
---|---|---|---|---|---|---|---|---|---|---|
5.3945 | 23.6162 | 200 | 7.0517 | 2.0468 | 1.7131 | 1.5892 | 0.8683 | 0.4093 | 0.2547 | 0.1286 |
3.8702 | 47.2325 | 400 | 6.2998 | 1.8651 | 1.5127 | 1.4674 | 0.7790 | 0.3461 | 0.1892 | 0.1047 |
3.146 | 70.8487 | 600 | 6.0631 | 1.8044 | 1.4661 | 1.4136 | 0.7520 | 0.3300 | 0.1769 | 0.0885 |
3.0395 | 94.4649 | 800 | 6.0386 | 1.7920 | 1.4613 | 1.4113 | 0.7502 | 0.3285 | 0.1760 | 0.0876 |
Base model
meta-llama/Llama-3.1-8B