Built with Axolotl

See axolotl config

axolotl version: 0.9.2

base_model: pretrained_models/Spark-TTS-0.5B/LLM
# Automatically upload checkpoint and final model to HF
hub_model_id: muhtasham/spark-llm-finetune-tj

trust_remote_code: true

strict: false

datasets:
  - path: data/output_prompt.jsonl
    type: completion

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/out


sequence_len: 4098
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true

wandb_project: spark-tts
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 4
num_epochs: 50
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 50
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 1
save_steps: 5000
debug:
deepspeed:
weight_decay: 0.0

spark-llm-finetune-tj

This model was trained from scratch on the data/output_prompt.jsonl dataset. It achieves the following results on the evaluation set:

  • Loss: 5.2546

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Validation Loss
No log 0.0088 1 9.9240
5.5236 0.9978 114 5.5667
5.0799 1.9891 228 5.3932
4.9292 2.9803 342 5.3107
4.7729 3.9716 456 5.2529
4.7022 4.9628 570 5.2174
4.6598 5.9540 684 5.1988
4.6176 6.9453 798 5.1833
4.5814 7.9365 912 5.1737
4.5422 8.9278 1026 5.1687
4.506 9.9190 1140 5.1643
4.492 10.9103 1254 5.1646
4.4605 11.9015 1368 5.1670
4.4384 12.8928 1482 5.1699
4.4151 13.8840 1596 5.1751
4.4053 14.8753 1710 5.1766
4.3875 15.8665 1824 5.1807
4.3684 16.8578 1938 5.1879
4.3624 17.8490 2052 5.1921
4.3413 18.8403 2166 5.1983
4.3302 19.8315 2280 5.2020
4.3179 20.8228 2394 5.2081
4.3152 21.8140 2508 5.2157
4.306 22.8053 2622 5.2180
4.2989 23.7965 2736 5.2243
4.2982 24.7877 2850 5.2282
4.2862 25.7790 2964 5.2328
4.2827 26.7702 3078 5.2339
4.2775 27.7615 3192 5.2368
4.2802 28.7527 3306 5.2417
4.2686 29.7440 3420 5.2434
4.2713 30.7352 3534 5.2432
4.2689 31.7265 3648 5.2476
4.2687 32.7177 3762 5.2481
4.2651 33.7090 3876 5.2508
4.266 34.7002 3990 5.2509
4.2644 35.6915 4104 5.2517
4.2626 36.6827 4218 5.2517
4.2646 37.6740 4332 5.2525
4.2617 38.6652 4446 5.2524
4.2603 39.6565 4560 5.2544
4.2633 40.6477 4674 5.2537
4.2561 41.6389 4788 5.2522
4.2612 42.6302 4902 5.2546
4.2618 43.6214 5016 5.2530
4.2602 44.6127 5130 5.2540
4.2619 45.6039 5244 5.2543
4.263 46.5952 5358 5.2549
4.2625 47.5864 5472 5.2547
4.2611 48.5777 5586 5.2545
4.2621 49.5689 5700 5.2546

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
44
Safetensors
Model size
507M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support