fblgit
/

pancho-v1-qw25-3B-UNAMGS

@@ -1,84 +1,96 @@
----
-library_name: peft
-license: other
-base_model: Qwen/Qwen2.5-3B-Instruct
-tags:
-- generated_from_trainer
-model-index:
-- name: pancho-v1-qw25-3B-UNAMGS
-  results: []
-datasets:
-- Magpie-Align/Magpie-Pro-MT-300K-v0.1
-- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
-language:
-- en
----
-# pancho-v1-qw25-3B-UNAMGS
-This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct):
-It achieves the following results on the evaluation set:
-- Loss: 0.6555
-![pancho-v1-qw25-3B-UNAMGS](https://huggingface.co/fblgit/pancho-v1-qw25-3B-UNAMGS/resolve/main/pancho-v1-qw25-3B.png)
-[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
-## Model description
-Trained with MagPie:
-- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
-- Magpie-Align/Magpie-Pro-MT-300K-v0.1
-UNA on MLPs `4, 10, 16, 22, 28`
-MGS on 3 Scales.
-Following https://arxiv.org/abs//2410.21228 facts.
-## License & Derivatives
-Any derivative (sft, merges, etc) using **ANY** layer from this model **MUST** include either `UNA` or `MGS` or `PANCHO` in their model name in order to obtain a LICENSE for derivatives of this model.
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 8
-- total_train_batch_size: 256
-- total_eval_batch_size: 16
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- num_epochs: 1
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 1.2127        | 0.0015 | 1    | 0.8711          |
-| 0.9905        | 0.0509 | 35   | 0.7338          |
-| 0.9685        | 0.1019 | 70   | 0.7114          |
-| 0.9554        | 0.1528 | 105  | 0.6994          |
-| 0.9077        | 0.2037 | 140  | 0.6915          |
-| 0.9149        | 0.2547 | 175  | 0.6859          |
-| 0.9363        | 0.3056 | 210  | 0.6795          |
-| 0.8975        | 0.3566 | 245  | 0.6745          |
-| 0.9095        | 0.4075 | 280  | 0.6709          |
-| 0.9216        | 0.4584 | 315  | 0.6681          |
-| 0.9143        | 0.5094 | 350  | 0.6666          |
-| 0.8879        | 0.5603 | 385  | 0.6645          |
-| 0.9194        | 0.6112 | 420  | 0.6625          |
-| 0.9123        | 0.6622 | 455  | 0.6615          |
-| 0.9056        | 0.7131 | 490  | 0.6591          |
-| 0.9172        | 0.7641 | 525  | 0.6578          |
-| 0.886         | 0.8150 | 560  | 0.6566          |
-| 0.9155        | 0.8659 | 595  | 0.6568          |
-| 0.9029        | 0.9169 | 630  | 0.6560          |
-| 0.8942        | 0.9678 | 665  | 0.6555          |
-### Framework versions
-- PEFT 0.13.2
-- Transformers 4.45.2
-- Pytorch 2.3.0+cu121
-- Datasets 3.0.1
 - Tokenizers 0.20.1#

+---
+library_name: peft
+license: other
+base_model: Qwen/Qwen2.5-3B-Instruct
+tags:
+- generated_from_trainer
+datasets:
+- Magpie-Align/Magpie-Pro-MT-300K-v0.1
+- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+model-index:
+- name: pancho-v1-qw25-3B-UNAMGS
+  results: []
+---
+# pancho-v1-qw25-3B-UNAMGS
+This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct):
+It achieves the following results on the evaluation set:
+- Loss: 0.6555
+![pancho-v1-qw25-3B-UNAMGS](https://huggingface.co/fblgit/pancho-v1-qw25-3B-UNAMGS/resolve/main/pancho-v1-qw25-3B.png)
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+## Model description
+Trained with MagPie:
+- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
+- Magpie-Align/Magpie-Pro-MT-300K-v0.1
+UNA on MLPs `4, 10, 16, 22, 28`
+MGS on 3 Scales.
+Following https://arxiv.org/abs//2410.21228 facts.
+## License & Derivatives
+Any derivative (sft, merges, etc) using **ANY** layer from this model **MUST** include either `UNA` or `MGS` or `PANCHO` in their model name in order to obtain a LICENSE for derivatives of this model.
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- total_train_batch_size: 256
+- total_eval_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 1.2127        | 0.0015 | 1    | 0.8711          |
+| 0.9905        | 0.0509 | 35   | 0.7338          |
+| 0.9685        | 0.1019 | 70   | 0.7114          |
+| 0.9554        | 0.1528 | 105  | 0.6994          |
+| 0.9077        | 0.2037 | 140  | 0.6915          |
+| 0.9149        | 0.2547 | 175  | 0.6859          |
+| 0.9363        | 0.3056 | 210  | 0.6795          |
+| 0.8975        | 0.3566 | 245  | 0.6745          |
+| 0.9095        | 0.4075 | 280  | 0.6709          |
+| 0.9216        | 0.4584 | 315  | 0.6681          |
+| 0.9143        | 0.5094 | 350  | 0.6666          |
+| 0.8879        | 0.5603 | 385  | 0.6645          |
+| 0.9194        | 0.6112 | 420  | 0.6625          |
+| 0.9123        | 0.6622 | 455  | 0.6615          |
+| 0.9056        | 0.7131 | 490  | 0.6591          |
+| 0.9172        | 0.7641 | 525  | 0.6578          |
+| 0.886         | 0.8150 | 560  | 0.6566          |
+| 0.9155        | 0.8659 | 595  | 0.6568          |
+| 0.9029        | 0.9169 | 630  | 0.6560          |
+| 0.8942        | 0.9678 | 665  | 0.6555          |
+### Framework versions
+- PEFT 0.13.2
+- Transformers 4.45.2
+- Pytorch 2.3.0+cu121
+- Datasets 3.0.1
 - Tokenizers 0.20.1#