rwmasood commited on
Commit
b1cd694
·
verified ·
1 Parent(s): ac30751

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -7
README.md CHANGED
@@ -25,16 +25,10 @@ base_model:
25
 
26
  ## Main Message
27
 
28
- We present our initial results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.
29
-
30
- In our approach, we carefully selected the dense layers from Qwen2.5-0.5B to construct our model. Notably, while Qwen2.5-0.5B was trained on *18 trillion* tokens, our model was trained on only *5 billion* tokens—over three orders of magnitude fewer—yet it achieves comparable performance.
31
-
32
- **Note**: Please note that this model has not yet been instruction-tuned; instruction-tuning is an area of ongoing development.
33
 
34
  ## Evaluation Results
35
 
36
-
37
-
38
  ### Harness Evaluation
39
 
40
  - The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 
25
 
26
  ## Main Message
27
 
28
+ Here is the instruction-tuned version of the pretrained **Kiwi-1.0-0.7B** model. As can been seen in the table below, the results are at paar with the SOTA Qwen2.5-0.5B.
 
 
 
 
29
 
30
  ## Evaluation Results
31
 
 
 
32
  ### Harness Evaluation
33
 
34
  - The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).