ydeng9 commited on
Commit
d457f58
1 Parent(s): e905063

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -6
README.md CHANGED
@@ -6,15 +6,55 @@ language:
6
  - en
7
  pipeline_tag: text-generation
8
  ---
9
- # Model Card for Test
10
 
11
- This modelcard aims to be a quick sanity check and will be updated later.
 
 
12
 
13
  ## Model Details
14
 
15
  ### Model Description
16
 
17
- - **Model type:** A 7B parameter GPT-like model fine-tuned on synthetic datasets.
18
- - **Language(s) (NLP):** Primarily English
19
- - **License:** MIT
20
- - **Finetuned from model [optional]:** mistralai/Mistral-7B-v0.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - en
7
  pipeline_tag: text-generation
8
  ---
9
+ Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models (https://arxiv.org/abs/2401.01335)
10
 
11
+ # zephyr-7b-sft-full-spin-iter0
12
+
13
+ This model is a self-play fine-tuned model at iteration 0 from [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) using synthetic data based on on the [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset.
14
 
15
  ## Model Details
16
 
17
  ### Model Description
18
 
19
+ - Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
20
+ - Language(s) (NLP): Primarily English
21
+ - License: MIT
22
+ - Finetuned from model: alignment-handbook/zephyr-7b-sft-full (based on mistralai/Mistral-7B-v0.1)
23
+
24
+ ### Training hyperparameters
25
+ The following hyperparameters were used during training:
26
+
27
+ - learning_rate: 5e-07
28
+ - train_batch_size: 8
29
+ - seed: 42
30
+ - distributed_type: multi-GPU
31
+ - num_devices: 8
32
+ - total_train_batch_size: 64
33
+ - optimizer: RMSProp
34
+ - lr_scheduler_type: linear
35
+ - lr_scheduler_warmup_ratio: 0.1
36
+ - num_epochs: 2.0
37
+
38
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
39
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_UCLA-AGI__test0)
40
+ | Metric | Value |
41
+ |-----------------------|---------------------------|
42
+ | Avg. | 62.37 |
43
+ | ARC (25-shot) | 63.65 |
44
+ | HellaSwag (10-shot) | 84.44 |
45
+ | MMLU (5-shot) | 61.01 |
46
+ | TruthfulQA (0-shot) | 50.48 |
47
+ | Winogrande (5-shot) | 77.98 |
48
+ | GSM8K (5-shot) | 36.69 |
49
+
50
+ ## Citation
51
+ ```
52
+ @misc{chen2024selfplay,
53
+ title={Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models},
54
+ author={Zixiang Chen and Yihe Deng and Huizhuo Yuan and Kaixuan Ji and Quanquan Gu},
55
+ year={2024},
56
+ eprint={2401.01335},
57
+ archivePrefix={arXiv},
58
+ primaryClass={cs.LG}
59
+ }
60
+ ```