Text Generation
Transformers
Safetensors
qwen3
Generated from Trainer
conversational
text-generation-inference

Built with Axolotl

See axolotl config

axolotl version: 0.11.0

base_model: Qwen/Qwen3-1.7B

# plugins:
#   - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
strict: false

# plugins:
#   - axolotl.integrations.liger.LigerPlugin

# liger_rope: true
# liger_rms_norm: true
# liger_glu_activation: true
# liger_layer_norm: true
# liger_fused_linear_cross_entropy: true

datasets:
  - path: sumuks/essential-web-v1.0-sample-100M-with-cleaned-responses-sft
    type: chat_template
    field_messages: conversations
    split: train
val_set_size: 0.05
dataset_prepared_path: dataset/prepared_dataset_1.7b

train_on_inputs: false
output_dir: ./output/1.7B-Instruct-Tuned-New-Data
chat_template: qwen3
sequence_len: 8192
sample_packing: true
eval_sample_packing: true
# pad_to_sequence_len: true

wandb_project: essential-web-sft
wandb_name: qwen3-1.7b-sft-new-data

gradient_accumulation_steps: 4
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
flash_attention: true
micro_batch_size: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5
num_epochs: 1

load_best_model_at_end: true
metric_for_best_model: loss
greater_is_better: false

early_stopping_patience: 3
bf16: auto
tf32: true

logging_steps: 5

deepspeed: ./configs_prod/zero3.json

save_steps: 500
eval_steps: 500

warmup_ratio: 0.05
# save_first_step: true

output/1.7B-Instruct-Tuned-New-Data

This model is a fine-tuned version of Qwen/Qwen3-1.7B on the sumuks/essential-web-v1.0-sample-100M-with-cleaned-responses-sft dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3669

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • total_eval_batch_size: 2
  • optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 164
  • training_steps: 3297

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 0.8829
0.3689 0.1517 500 0.4088
0.3919 0.3033 1000 0.3952
0.386 0.4550 1500 0.3839
0.409 0.6066 2000 0.3755
0.3473 0.7583 2500 0.3694
0.3518 0.9099 3000 0.3669

Framework versions

  • Transformers 4.53.1
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.2
Downloads last month
17
Safetensors
Model size
1.72B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for textcleanlm/1.7B-SFT

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(217)
this model

Dataset used to train textcleanlm/1.7B-SFT