BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI
note that training still WIP
This model is a fine-tuned version of BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v12-minipile on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.5937
- Accuracy: 0.4948
Training and evaluation data
KI dataset
hf-causal-experimental (pretrained=BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI,revision=main,trust_remote_code=True,dtype='float'), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_easy | 0 | acc | 0.4322 | ± | 0.0102 |
acc_norm | 0.3960 | ± | 0.0100 | ||
boolq | 1 | acc | 0.6196 | ± | 0.0085 |
lambada_openai | 0 | ppl | 61.6595 | ± | 2.4362 |
acc | 0.2779 | ± | 0.0062 | ||
openbookqa | 0 | acc | 0.1540 | ± | 0.0162 |
acc_norm | 0.2840 | ± | 0.0202 | ||
piqa | 0 | acc | 0.6028 | ± | 0.0114 |
acc_norm | 0.6028 | ± | 0.0114 | ||
winogrande | 0 | acc | 0.5193 | ± | 0.0140 |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.00025
- train_batch_size: 8
- eval_batch_size: 4
- seed: 2280
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
- lr_scheduler_type: inverse_sqrt
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
2.5744 | 0.08 | 200 | 2.7138 | 0.4776 |
2.5387 | 0.16 | 400 | 2.6713 | 0.4836 |
2.4718 | 0.23 | 600 | 2.6462 | 0.4873 |
2.4681 | 0.31 | 800 | 2.6328 | 0.4892 |
2.5351 | 0.39 | 1000 | 2.6227 | 0.4908 |
2.5316 | 0.47 | 1200 | 2.6159 | 0.4914 |
2.527 | 0.54 | 1400 | 2.6103 | 0.4921 |
2.4838 | 0.62 | 1600 | 2.6058 | 0.4930 |
2.4483 | 0.7 | 1800 | 2.6024 | 0.4934 |
2.426 | 0.78 | 2000 | 2.5998 | 0.4937 |
2.4685 | 0.86 | 2200 | 2.5961 | 0.4944 |
2.4473 | 0.93 | 2400 | 2.5937 | 0.4948 |
Framework versions
- Transformers 4.36.0.dev0
- Pytorch 2.1.0
- Datasets 2.15.0
- Tokenizers 0.15.0
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 29.23 |
AI2 Reasoning Challenge (25-Shot) | 23.81 |
HellaSwag (10-Shot) | 29.39 |
MMLU (5-Shot) | 25.37 |
TruthfulQA (0-shot) | 44.77 |
Winogrande (5-shot) | 51.14 |
GSM8k (5-shot) | 0.91 |
- Downloads last month
- 33
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard23.810
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard29.390
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard25.370
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard44.770
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard51.140
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard0.910