Parkerlambert123 commited on
Commit
bdc60a1
·
verified ·
1 Parent(s): aa1c5ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -22
README.md CHANGED
@@ -15,11 +15,12 @@ tags:
15
  - qwen2
16
  library_name: transformers
17
  ---
18
- # Zhi-writing-dsr1-14b
 
19
 
20
  ## 1. Introduction
21
 
22
- Zhi-writing-dsr1-14b is a fine-tuned model based on DeepSeek-R1-Distill-Qwen-14B, specifically optimized for enhanced creative writing capabilities. Several benchmark evaluations indicate the model's improved creative writing performance.
23
 
24
  In the [LLM Creative Story-Writing Benchmark](https://github.com/lechmazur/writing), the model achieved a score of **8.33** compared to its base model's **7.8**. In the [WritingBench](https://github.com/X-PLUG/WritingBench) evaluation framework, it scored **8.46**, showing improvement over DeepSeek-R1-Distill-Qwen-14B's **7.93**. The model was also evaluated using GPT-4o on the AlpacaEval dataset, achieving an **82.6%** win rate when compared with the base model.
25
 
@@ -28,7 +29,7 @@ The figure below shows the performance comparison across different domains in Wr
28
  ![writingbench](./images/writingbench_score.png)
29
 
30
  <figcaption style="text-align:center; font-size:0.9em; color:#666">
31
- Figure 1: WritingBench performance of Zhi-writing-dsr1-14b and DeepSeek-R1-Distill-Qwen-14B across 6 domains and 3 writing requirements evaluated with WritingBench critic model (scale: 1-10). The six domains include: (D1) Academic & Engineering, (D2) Finance & Business, (D3) Politics & Law, (D4) Literature & Art, (D5) Education, and (D6) Advertising & Marketing. The three writing requirements assessed are: (R1) Style, (R2) Format, and (R3) Length. Here, "C" indicates category-specific scores.
32
  </figcaption>
33
 
34
  ## 2. Training Process
@@ -46,9 +47,9 @@ To achieve optimal domain coverage, we meticulously balanced the distribution of
46
 
47
  ## 3. Evaluation Results
48
 
49
- Our evaluation results suggest promising improvements in the model's creative writing capabilities. In the LLM Creative Story-Writing Benchmark evaluation, the model achieved a score of **8.33**, showing an improvement from the base model's **7.87**. When assessed on WritingBench, a comprehensive framework for evaluating large language model writing abilities, the model attained a score of **8.46**. This places it in proximity to DeepSeek-R1's performance and represents an advancement over DeepSeek-R1-Distill-Qwen-14B's score of 7.93.
50
 
51
- With respect to general capabilities, evaluations indicate modest improvements of **2%–5% in knowledge and reasoning tasks (CMMLU, MMLU-Pro)**, alongside encouraging progress in mathematical reasoning as measured by benchmarks such as **AIME-2024, AIME-2025, and GSM8K**. The results suggest that the model maintains a balanced performance profile, with improvements observed across creative writing, knowledge/reasoning, and mathematical tasks compared to DeepSeek-R1-Distill-Qwen-14B. These characteristics potentially make it suitable for a range of general-purpose applications.
52
 
53
  ![general](./images/general_score.png)
54
 
@@ -58,7 +59,7 @@ Figure 2: When evaluating model performance, it is recommended to conduct multip
58
 
59
  ## 4. How to Run Locally
60
 
61
- Zhi-writing-dsr1-14b can be deployed on various hardware configurations, including GPUs with 80GB memory, a single H20/A800/H800, or dual RTX 4090. Additionally, the INT4 quantized version Zhi-writing-dsr1-14b-gptq-int4 can be deployed on a single RTX 4090.
62
 
63
  ### Transformers
64
 
@@ -66,7 +67,7 @@ Zhi-writing-dsr1-14b can be deployed on various hardware configurations, includi
66
  from transformers import AutoModelForCausalLM, AutoTokenizer
67
  from transformers.generation import GenerationConfig
68
 
69
- MODEL_NAME = "Zhihu-ai/Zhi-writing-dsr1-14b"
70
  tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
71
 
72
  # use bf16
@@ -121,12 +122,12 @@ print(response)
121
  You can easily start a service using [ZhiLight](https://github.com/zhihu/ZhiLight)
122
 
123
  ```bash
124
- docker run -it --net=host --gpus='"device=0"' -v /path/to/model:/mnt/models --entrypoints="" ghcr.io/zhihu/zhilight/zhilight:0.4.17-cu124 python -m zhilight.server.openai.entrypoints.api_server --model-path /mnt/models --port 8000 --enable-reasoning --reasoning-parser deepseek-r1 --served-model-name Zhi-writing-dsr1-14b
125
 
126
  curl http://localhost:8000/v1/completions \
127
  -H "Content-Type: application/json" \
128
  -d '{
129
- "model": "Zhi-writing-dsr1-14b",
130
  "prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
131
  "max_tokens": 4096,
132
  "temperature": 0.6,
@@ -143,15 +144,15 @@ For instance, you can easily start a service using [vLLM](https://github.com/vll
143
  pip install vllm>=0.6.4.post1
144
 
145
  # huggingface model id
146
- vllm serve Zhihu-ai/Zhi-writing-dsr1-14b --served-model-name Zhi-writing-dsr1-14b --port 8000
147
 
148
  # local path
149
- vllm serve /path/to/model --served-model-name Zhi-writing-dsr1-14b --port 8000
150
 
151
  curl http://localhost:8000/v1/completions \
152
  -H "Content-Type: application/json" \
153
  -d '{
154
- "model": "Zhi-writing-dsr1-14b",
155
  "prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
156
  "max_tokens": 4096,
157
  "temperature": 0.6,
@@ -169,16 +170,16 @@ You can also easily start a service using [SGLang](https://github.com/sgl-projec
169
  pip install "sglang[all]>=0.4.5" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
170
 
171
  # huggingface model id
172
- python -m sglang.launch_server --model-path Zhihu-ai/Zhi-writing-dsr1-14b --served-model-name Zhi-writing-dsr1-14b --port 8000
173
 
174
  # local path
175
- python -m sglang.launch_server --model-path /path/to/model --served-model-name Zhi-writing-dsr1-14b --port 8000
176
 
177
  # send request
178
  curl http://localhost:8000/v1/completions \
179
  -H "Content-Type: application/json" \
180
  -d '{
181
- "model": "Zhi-writing-dsr1-14b",
182
  "prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
183
  "max_tokens": 4096,
184
  "temperature": 0.6,
@@ -193,18 +194,18 @@ You can download ollama using [this](https://ollama.com/download/)
193
  * quantization: Q4_K_M
194
 
195
  ```bash
196
- ollama run zhihu/zhi-writing-dsr1-14b
197
  ```
198
 
199
  * bf16
200
 
201
  ```bash
202
- ollama run zhihu/zhi-writing-dsr1-14b:bf16
203
  ```
204
 
205
  ## 5. Usage Recommendations
206
 
207
- We recommend adhering to the following configurations when utilizing the Zhi-writing-dsr1-14b, including benchmarking, to achieve the expected performance:
208
 
209
  * Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
210
 
@@ -215,16 +216,16 @@ We recommend adhering to the following configurations when utilizing the Zhi-wri
215
  ## 6. Citation
216
 
217
  ```text
218
- @misc{Zhi-writing-dsr1-14b,
219
- title={Zhi-writing-dsr1-14b: Curriculum Reinforcement and Direct Preference Optimization for Robust Creative Writing in LLMs},
220
  author={Jiewu Wang, Xu Chen, Wenyuan Su, Chao Huang, Hongkui Gao, Lin Feng, Shan Wang, Lu Xu, Penghe Liu, Zebin Ou},
221
  year={2025},
222
  eprint={},
223
  archivePrefix={},
224
- url={https://huggingface.co/Zhihu-ai/Zhi-writing-dsr1-14b},
225
  }
226
  ```
227
 
228
  ## 7. Contact
229
 
230
- If you have any questions, please raise an issue or contact us at [[email protected]](mailto:[email protected]).
 
15
  - qwen2
16
  library_name: transformers
17
  ---
18
+
19
+ # Zhi-Create-DSR1-14B
20
 
21
  ## 1. Introduction
22
 
23
+ Zhi-Create-DSR1-14B is a fine-tuned model based on DeepSeek-R1-Distill-Qwen-14B, specifically optimized for enhanced creative writing capabilities. Several benchmark evaluations indicate the model's improved creative writing performance.
24
 
25
  In the [LLM Creative Story-Writing Benchmark](https://github.com/lechmazur/writing), the model achieved a score of **8.33** compared to its base model's **7.8**. In the [WritingBench](https://github.com/X-PLUG/WritingBench) evaluation framework, it scored **8.46**, showing improvement over DeepSeek-R1-Distill-Qwen-14B's **7.93**. The model was also evaluated using GPT-4o on the AlpacaEval dataset, achieving an **82.6%** win rate when compared with the base model.
26
 
 
29
  ![writingbench](./images/writingbench_score.png)
30
 
31
  <figcaption style="text-align:center; font-size:0.9em; color:#666">
32
+ Figure 1: WritingBench performance of Zhi-Create-DSR1-14B and DeepSeek-R1-Distill-Qwen-14B across 6 domains and 3 writing requirements evaluated with WritingBench critic model (scale: 1-10). The six domains include: (D1) Academic & Engineering, (D2) Finance & Business, (D3) Politics & Law, (D4) Literature & Art, (D5) Education, and (D6) Advertising & Marketing. The three writing requirements assessed are: (R1) Style, (R2) Format, and (R3) Length. Here, "C" indicates category-specific scores.
33
  </figcaption>
34
 
35
  ## 2. Training Process
 
47
 
48
  ## 3. Evaluation Results
49
 
50
+ Our evaluation results suggest promising improvements in the model's creative writing capabilities. In the LLM Creative Story-Writing Benchmark evaluation, the model achieved a score of **8.33**, showing an improvement from the base model's **7.87**. When assessed on WritingBench, a comprehensive framework for evaluating large language model writing abilities, the model attained a score of **8.46**. This places it in proximity to DeepSeek-R1's performance and represents an advancement over DeepSeek-R1-Distill-Qwen-14B's score of **7.93**.
51
 
52
+ With respect to general capabilities, evaluations indicate modest improvements of **2%–5% in knowledge and reasoning tasks (CMMLU, MMLU-Pro)**, alongside encouraging progress in mathematical reasoning as measured by benchmarks such as **AIME-2024, AIME-2025, and GSM8K**. The results suggest that the model maintains a balanced performance profile, with improvements observed across creative writing, knowledge/reasoning, and mathematical tasks compared to DeepSeek-R1-Distill-Qwen-14B. These characteristics potentially make it suitable for a range of general-purpose applications. We conducted additional evaluations on the instruction-following ifeval benchmark, with experimental results demonstrating a performance improvement in model capabilities from an initial score of **71.43** to an enhanced score of **74.71**.
53
 
54
  ![general](./images/general_score.png)
55
 
 
59
 
60
  ## 4. How to Run Locally
61
 
62
+ Zhi-Create-DSR1-14B can be deployed on various hardware configurations, including GPUs with 80GB memory, a single H20/A800/H800, or dual RTX 4090. Additionally, the INT4 quantized version Zhi-Create-DSR1-14B-GPTQ-INT4 can be deployed on a single RTX 4090.
63
 
64
  ### Transformers
65
 
 
67
  from transformers import AutoModelForCausalLM, AutoTokenizer
68
  from transformers.generation import GenerationConfig
69
 
70
+ MODEL_NAME = "Zhihu-ai/Zhi-Create-DSR1-14B"
71
  tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
72
 
73
  # use bf16
 
122
  You can easily start a service using [ZhiLight](https://github.com/zhihu/ZhiLight)
123
 
124
  ```bash
125
+ docker run -it --net=host --gpus='"device=0"' -v /path/to/model:/mnt/models --entrypoints="" ghcr.io/zhihu/zhilight/zhilight:0.4.17-cu124 python -m zhilight.server.openai.entrypoints.api_server --model-path /mnt/models --port 8000 --enable-reasoning --reasoning-parser deepseek-r1 --served-model-name Zhi-Create-DSR1-14B
126
 
127
  curl http://localhost:8000/v1/completions \
128
  -H "Content-Type: application/json" \
129
  -d '{
130
+ "model": "Zhi-Create-DSR1-14B",
131
  "prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
132
  "max_tokens": 4096,
133
  "temperature": 0.6,
 
144
  pip install vllm>=0.6.4.post1
145
 
146
  # huggingface model id
147
+ vllm serve Zhihu-ai/Zhi-Create-DSR1-14B --served-model-name Zhi-Create-DSR1-14B --port 8000
148
 
149
  # local path
150
+ vllm serve /path/to/model --served-model-name Zhi-Create-DSR1-14B --port 8000
151
 
152
  curl http://localhost:8000/v1/completions \
153
  -H "Content-Type: application/json" \
154
  -d '{
155
+ "model": "Zhi-Create-DSR1-14B",
156
  "prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
157
  "max_tokens": 4096,
158
  "temperature": 0.6,
 
170
  pip install "sglang[all]>=0.4.5" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
171
 
172
  # huggingface model id
173
+ python -m sglang.launch_server --model-path Zhihu-ai/Zhi-Create-DSR1-14B --served-model-name Zhi-Create-DSR1-14B --port 8000
174
 
175
  # local path
176
+ python -m sglang.launch_server --model-path /path/to/model --served-model-name Zhi-Create-DSR1-14B --port 8000
177
 
178
  # send request
179
  curl http://localhost:8000/v1/completions \
180
  -H "Content-Type: application/json" \
181
  -d '{
182
+ "model": "Zhi-Create-DSR1-14B",
183
  "prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
184
  "max_tokens": 4096,
185
  "temperature": 0.6,
 
194
  * quantization: Q4_K_M
195
 
196
  ```bash
197
+ ollama run zhihu/zhi-create-dsr1-14b
198
  ```
199
 
200
  * bf16
201
 
202
  ```bash
203
+ ollama run zhihu/zhi-create-dsr1-14b:bf16
204
  ```
205
 
206
  ## 5. Usage Recommendations
207
 
208
+ We recommend adhering to the following configurations when utilizing the Zhi-Create-DSR1-14B, including benchmarking, to achieve the expected performance:
209
 
210
  * Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
211
 
 
216
  ## 6. Citation
217
 
218
  ```text
219
+ @misc{Zhi-Create-DSR1-14B,
220
+ title={Zhi-Create-DSR1-14B: Curriculum Reinforcement and Direct Preference Optimization for Robust Creative Writing in LLMs},
221
  author={Jiewu Wang, Xu Chen, Wenyuan Su, Chao Huang, Hongkui Gao, Lin Feng, Shan Wang, Lu Xu, Penghe Liu, Zebin Ou},
222
  year={2025},
223
  eprint={},
224
  archivePrefix={},
225
+ url={https://huggingface.co/Zhihu-ai/Zhi-Create-DSR1-14B},
226
  }
227
  ```
228
 
229
  ## 7. Contact
230
 
231
+ If you have any questions, please raise an issue or contact us at [[email protected]](mailto:[email protected]).