jukofyork commited on
Commit
69f0aca
·
verified ·
1 Parent(s): 40951bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -26,6 +26,10 @@ language:
26
 
27
  ![image.webp](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/KL97x9lVuhmIPXbbKgvyY.webp)
28
 
 
 
 
 
29
  A `0.6B` parameter draft (speculative decoding) model for use with [deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324) and [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3).
30
 
31
  **NOTES**:
 
26
 
27
  ![image.webp](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/KL97x9lVuhmIPXbbKgvyY.webp)
28
 
29
+ ***NOTE***: *This is just a slightly improved version that I trained using `"max_position_embeddings": 65536` + `"rope_scaling": {"factor": 2.0, ...` rather than setting the `rope_scaling` after training...*
30
+
31
+ ---
32
+
33
  A `0.6B` parameter draft (speculative decoding) model for use with [deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324) and [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3).
34
 
35
  **NOTES**: