See axolotl config
axolotl version: 0.4.1
base_model: fxmarty/small-llama-testing
batch_size: 32
bf16: true
chat_template: tokenizer_default_fallback_alpaca
datasets:
- format: custom
path: argilla/databricks-dolly-15k-curated-en
type:
field_input: original-instruction
field_instruction: original-instruction
field_output: original-response
format: '{instruction} {input}'
no_input_format: '{instruction}'
system_format: '{system}'
system_prompt: ''
eval_steps: 20
flash_attention: true
gpu_memory_limit: 80GiB
gradient_checkpointing: true
group_by_length: true
hub_model_id: SystemAdmin123/test-repo
hub_strategy: checkpoint
learning_rate: 0.0002
logging_steps: 10
lr_scheduler: cosine
micro_batch_size: 19
model_type: AutoModelForCausalLM
num_epochs: 10
optimizer: adamw_bnb_8bit
output_dir: /workspace/axolotl/configs
pad_to_sequence_len: true
resize_token_embeddings_to_32x: false
sample_packing: false
save_steps: 40
save_total_limit: 1
sequence_len: 2048
special_tokens:
pad_token: </s>
tokenizer_type: LlamaTokenizerFast
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.1
wandb_entity: ''
wandb_mode: online
wandb_name: fxmarty/small-llama-testing-argilla/databricks-dolly-15k-curated-en
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: default
warmup_ratio: 0.05
xformers_attention: true
test-repo
This model is a fine-tuned version of fxmarty/small-llama-testing on the None dataset. It achieves the following results on the evaluation set:
- Loss: 6.0848
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 19
- eval_batch_size: 19
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 152
- total_eval_batch_size: 152
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 26
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 0.0112 | 1 | 10.4228 |
10.127 | 0.2247 | 20 | 9.8632 |
9.0393 | 0.4494 | 40 | 8.7403 |
8.1127 | 0.6742 | 60 | 7.9189 |
7.5513 | 0.8989 | 80 | 7.4579 |
7.2769 | 1.1236 | 100 | 7.2770 |
7.1384 | 1.3483 | 120 | 7.1767 |
7.0576 | 1.5730 | 140 | 7.0575 |
6.9564 | 1.7978 | 160 | 6.9379 |
6.8785 | 2.0225 | 180 | 6.8208 |
6.7027 | 2.2472 | 200 | 6.7212 |
6.5913 | 2.4719 | 220 | 6.6362 |
6.498 | 2.6966 | 240 | 6.5572 |
6.4453 | 2.9213 | 260 | 6.4721 |
6.2635 | 3.1461 | 280 | 6.4126 |
6.236 | 3.3708 | 300 | 6.3658 |
6.2733 | 3.5955 | 320 | 6.3162 |
6.2472 | 3.8202 | 340 | 6.2870 |
6.1738 | 4.0449 | 360 | 6.2401 |
6.0509 | 4.2697 | 380 | 6.2184 |
6.0158 | 4.4944 | 400 | 6.1959 |
6.0043 | 4.7191 | 420 | 6.1770 |
6.0249 | 4.9438 | 440 | 6.1570 |
5.9625 | 5.1685 | 460 | 6.1471 |
6.0231 | 5.3933 | 480 | 6.1303 |
5.9395 | 5.6180 | 500 | 6.1241 |
5.8278 | 5.8427 | 520 | 6.1094 |
5.8774 | 6.0674 | 540 | 6.1078 |
5.8393 | 6.2921 | 560 | 6.1025 |
5.8534 | 6.5169 | 580 | 6.0983 |
5.9313 | 6.7416 | 600 | 6.1013 |
5.8947 | 6.9663 | 620 | 6.0989 |
5.8936 | 7.1910 | 640 | 6.0971 |
5.8275 | 7.4157 | 660 | 6.0950 |
5.822 | 7.6404 | 680 | 6.0899 |
5.8637 | 7.8652 | 700 | 6.0883 |
5.8951 | 8.0899 | 720 | 6.0958 |
5.8697 | 8.3146 | 740 | 6.0906 |
5.9076 | 8.5393 | 760 | 6.0889 |
5.8149 | 8.7640 | 780 | 6.0894 |
5.7888 | 8.9888 | 800 | 6.0916 |
5.8096 | 9.2135 | 820 | 6.0938 |
5.8319 | 9.4382 | 840 | 6.0857 |
5.8508 | 9.6629 | 860 | 6.0901 |
5.8517 | 9.8876 | 880 | 6.0848 |
Framework versions
- Transformers 4.46.0
- Pytorch 2.5.0+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for SystemAdmin123/test-repo
Base model
fxmarty/small-llama-testing