See axolotl config
axolotl version: 0.5.3.dev0
wandb_project: llm-training-platform
wandb_name: II-Tulu-3B-SFT
datasets:
- path: allenai/tulu-3-sft-mixture
split: train
type: chat_template
field_messages: messages
message_field_role: role
message_field_content: content
roles:
system:
- system
user:
- user
assistant:
- assistant
chat_template: qwen_25
sequence_len: 2048
base_model: Qwen/Qwen2.5-3B
output_dir: checkpoints/1357e2cd-76bc-46d5-a394-949b712427c7
dataset_prepared_path: checkpoints/1357e2cd-76bc-46d5-a394-949b712427c7/dataset_prepared
flash_attention: true
train_on_inputs: false
pad_to_sequence_len: true
eval_sample_packing: false
push_to_hub: true
bf16: auto
gradient_checkpointing: true
logging_steps: 10
hub_model_id: phunguyen01/II-Tulu-3B-SFT
learning_rate: 5.0e-06
micro_batch_size: 8
num_epochs: 2
seed: 42
gradient_accumulation_steps: 2
sample_packing: true
val_set_size: 0
II-Tulu-3B-SFT
This model is a fine-tuned version of Qwen/Qwen2.5-3B on the allenai/tulu-3-sft-mixture dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Use adamw_hf with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 2
Training results
Framework versions
- Transformers 4.47.0
- Pytorch 2.4.0+cu121
- Datasets 3.1.0
- Tokenizers 0.21.0
- Downloads last month
- 529
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.