See axolotl config

axolotl version: 0.9.2

base_model: pretrained_models/Spark-TTS-0.5B/LLM
# Automatically upload checkpoint and final model to HF
hub_model_id: muhtasham/spark-llm-finetune-tj

trust_remote_code: true

strict: false

datasets:
  - path: data/output_prompt.jsonl
    type: completion

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/out


sequence_len: 4098
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true

wandb_project: spark-tts
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 4
num_epochs: 50
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 50
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 1
save_steps: 5000
debug:
deepspeed:
weight_decay: 0.0

spark-llm-finetune-tj

This model was trained from scratch on the data/output_prompt.jsonl dataset. It achieves the following results on the evaluation set:

Loss: 5.2546

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 50.0

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0088	1	9.9240
5.5236	0.9978	114	5.5667
5.0799	1.9891	228	5.3932
4.9292	2.9803	342	5.3107
4.7729	3.9716	456	5.2529
4.7022	4.9628	570	5.2174
4.6598	5.9540	684	5.1988
4.6176	6.9453	798	5.1833
4.5814	7.9365	912	5.1737
4.5422	8.9278	1026	5.1687
4.506	9.9190	1140	5.1643
4.492	10.9103	1254	5.1646
4.4605	11.9015	1368	5.1670
4.4384	12.8928	1482	5.1699
4.4151	13.8840	1596	5.1751
4.4053	14.8753	1710	5.1766
4.3875	15.8665	1824	5.1807
4.3684	16.8578	1938	5.1879
4.3624	17.8490	2052	5.1921
4.3413	18.8403	2166	5.1983
4.3302	19.8315	2280	5.2020
4.3179	20.8228	2394	5.2081
4.3152	21.8140	2508	5.2157
4.306	22.8053	2622	5.2180
4.2989	23.7965	2736	5.2243
4.2982	24.7877	2850	5.2282
4.2862	25.7790	2964	5.2328
4.2827	26.7702	3078	5.2339
4.2775	27.7615	3192	5.2368
4.2802	28.7527	3306	5.2417
4.2686	29.7440	3420	5.2434
4.2713	30.7352	3534	5.2432
4.2689	31.7265	3648	5.2476
4.2687	32.7177	3762	5.2481
4.2651	33.7090	3876	5.2508
4.266	34.7002	3990	5.2509
4.2644	35.6915	4104	5.2517
4.2626	36.6827	4218	5.2517
4.2646	37.6740	4332	5.2525
4.2617	38.6652	4446	5.2524
4.2603	39.6565	4560	5.2544
4.2633	40.6477	4674	5.2537
4.2561	41.6389	4788	5.2522
4.2612	42.6302	4902	5.2546
4.2618	43.6214	5016	5.2530
4.2602	44.6127	5130	5.2540
4.2619	45.6039	5244	5.2543
4.263	46.5952	5358	5.2549
4.2625	47.5864	5472	5.2547
4.2611	48.5777	5586	5.2545
4.2621	49.5689	5700	5.2546

Framework versions

Transformers 4.51.3
Pytorch 2.7.1+cu126
Datasets 3.5.1
Tokenizers 0.21.1

muhtasham
/

spark-llm-finetune-tj

spark-llm-finetune-tj

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results