7b-claude-32k-20250419_144159-2ep

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2546

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 8
  • total_eval_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
0.3222 0.0141 1 0.3511
0.4486 0.0282 2 0.3453
0.2803 0.0423 3 0.3267
0.4093 0.0563 4 0.3111
0.3198 0.0704 5 0.3250
0.2777 0.0845 6 0.3242
0.3484 0.0986 7 0.3306
0.3831 0.1127 8 0.3230
0.3736 0.1268 9 0.3133
0.2953 0.1408 10 0.3018
0.2026 0.1549 11 0.2956
0.34 0.1690 12 0.2912
0.3512 0.1831 13 0.2871
0.266 0.1972 14 0.2841
0.2418 0.2113 15 0.2817
0.341 0.2254 16 0.2793
0.2892 0.2394 17 0.2771
0.2588 0.2535 18 0.2754
0.2445 0.2676 19 0.2743
0.2154 0.2817 20 0.2734
0.3088 0.2958 21 0.2720
0.3071 0.3099 22 0.2706
0.2722 0.3239 23 0.2696
0.2445 0.3380 24 0.2688
0.3157 0.3521 25 0.2682
0.2631 0.3662 26 0.2677
0.2585 0.3803 27 0.2670
0.2227 0.3944 28 0.2662
0.3109 0.4085 29 0.2654
0.2332 0.4225 30 0.2647
0.2941 0.4366 31 0.2640
0.2865 0.4507 32 0.2635
0.2643 0.4648 33 0.2630
0.2841 0.4789 34 0.2626
0.2545 0.4930 35 0.2622
0.2545 0.5070 36 0.2616
0.2576 0.5211 37 0.2611
0.2972 0.5352 38 0.2606
0.2037 0.5493 39 0.2603
0.3232 0.5634 40 0.2600
0.3188 0.5775 41 0.2596
0.2772 0.5915 42 0.2592
0.2533 0.6056 43 0.2587
0.3034 0.6197 44 0.2582
0.2451 0.6338 45 0.2578
0.2246 0.6479 46 0.2574
0.2677 0.6620 47 0.2572
0.1886 0.6761 48 0.2568
0.2283 0.6901 49 0.2566
0.2043 0.7042 50 0.2564
0.2563 0.7183 51 0.2563
0.198 0.7324 52 0.2561
0.2197 0.7465 53 0.2560
0.2397 0.7606 54 0.2558
0.3545 0.7746 55 0.2557
0.2461 0.7887 56 0.2555
0.2237 0.8028 57 0.2554
0.2927 0.8169 58 0.2553
0.3508 0.8310 59 0.2551
0.2562 0.8451 60 0.2550
0.2408 0.8592 61 0.2549
0.2268 0.8732 62 0.2548
0.206 0.8873 63 0.2548
0.297 0.9014 64 0.2547
0.2448 0.9155 65 0.2546
0.2219 0.9296 66 0.2546
0.2715 0.9437 67 0.2546
0.3815 0.9577 68 0.2546
0.2862 0.9718 69 0.2546
0.2526 0.9859 70 0.2545
0.1974 1.0 71 0.2546

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
1
Safetensors
Model size
7.62B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for merty/7b-claude-32k-20250419_144159-2ep

Base model

Qwen/Qwen2.5-7B
Finetuned
(1384)
this model