qwen_new_mage_per_domain_balanced_moe

Files changed (16) hide show

README.md CHANGED Viewed

@@ -18,8 +18,8 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [Qwen/Qwen1.5-MoE-A2.7B](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: nan
-- Accuracy: 0.5343
 ## Model description
@@ -50,10 +50,15 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss | Accuracy |
 |:-------------:|:------:|:----:|:---------------:|:--------:|
-| 0.2979        | 0.0006 | 100  | nan             | 0.5343   |
-| 0.0           | 0.0013 | 200  | nan             | 0.5343   |
-| 0.0           | 0.0019 | 300  | nan             | 0.5343   |
-| 0.0           | 0.0025 | 400  | nan             | 0.5343   |
 ### Framework versions

 This model is a fine-tuned version of [Qwen/Qwen1.5-MoE-A2.7B](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.8781
+- Accuracy: 0.5357
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss | Accuracy |
 |:-------------:|:------:|:----:|:---------------:|:--------:|
+| 5.4785        | 0.0006 | 100  | 6.6629          | 0.5689   |
+| 4.8644        | 0.0013 | 200  | 10.6619         | 0.5316   |
+| 4.5014        | 0.0019 | 300  | 3.0574          | 0.5299   |
+| 3.3262        | 0.0025 | 400  | 3.2657          | 0.4643   |
+| 2.7274        | 0.0032 | 500  | 2.0543          | 0.5314   |
+| 2.3305        | 0.0038 | 600  | 1.9673          | 0.4682   |
+| 2.4483        | 0.0044 | 700  | 2.7203          | 0.5357   |
+| 3.201         | 0.0051 | 800  | 3.5143          | 0.5357   |
+| 2.8675        | 0.0057 | 900  | 2.8781          | 0.5357   |
 ### Framework versions

config.json CHANGED Viewed

@@ -32,7 +32,7 @@
   "shared_expert_intermediate_size": 5632,
   "sliding_window": null,
   "tie_word_embeddings": false,
-  "torch_dtype": "float16",
   "transformers_version": "4.49.0",
   "use_cache": true,
   "use_sliding_window": false,

   "shared_expert_intermediate_size": 5632,
   "sliding_window": null,
   "tie_word_embeddings": false,
+  "torch_dtype": "float32",
   "transformers_version": "4.49.0",
   "use_cache": true,
   "use_sliding_window": false,

evaluation_results.json ADDED Viewed

+{
+    "eval_loss": NaN,
+    "eval_accuracy": 0.5342839036755387,
+    "eval_runtime": 5653.9393,
+    "eval_samples_per_second": 2.791,
+    "eval_steps_per_second": 2.791,
+    "epoch": 0.0025348703096977803
+}

model-00001-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:35b3dc035b6760d5bcbed1067923f118d7b3577ac004a92dd2ab8c1b6d1db743
+size 4990221104

model-00002-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:32aaaec990f4903d580f5a8bbbe8b324962b638f8f2b228f0686a882f0811833
+size 4991306528

model-00003-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:6cfdce745903a9227dc466a6b06077f56a093ada6641e2ded20fd2c339059ac7
+size 4990298240

model-00004-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:7d38121cb1d24e44e400cc5334268364e80cdbbc6c2b8eef83d43af81f4dd825
+size 4990757696

model-00005-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:f3e346c8291370da473f259bb58ab14d6bbf998c802c3244f788180c749be13d
+size 4991306600

model-00006-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:5c00370a395737b486018825d1b9d2f25c2a7c73490d2d8796ee79ef107e2705
+size 4991306936

model-00007-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:c1c92cbd233968293e977f8bc8c5559d6c5759f3dc3d67bd76aea9e0bd5b643d
+size 4991306952

model-00008-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:e639e4e349a22ee82aa3f38713b86845d3a2d091d6d36db284aac6a4bd54932d
+size 4968238032

model-00009-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:84ae8046cd8b640ccc4574301daa4094b28c5f3dc4e9c2d6661d9be0f05087db
+size 4989749864

model-00010-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:e5a0e70ebd04f8100fb2059cbb53b8997d6c1400aa1d6631f14119518d400e10
+size 4991306928

model-00011-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f675aba552735b1fa296cfd57f9d3b7f3d6c3c485ab9b8b71d9c9a6fe229d89
+size 4991306936

model-00012-of-00012.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:74e9fbda0980b30b87af9b46ea7a1524baa5590d86a2e740ed446e75cdeb4860
+size 1141959976

model.safetensors.index.json CHANGED Viewed

The diff for this file is too large to render. See raw diff