empirischtech
/

Kiwi-1.0-0.7B-32k-Instruct

Model card Files Files and versions Community

rwmasood commited on Mar 17

Commit

b1cd694

·

verified ·

1 Parent(s): ac30751

Update README.md

Files changed (1) hide show

README.md +1 -7

README.md CHANGED Viewed

@@ -25,16 +25,10 @@ base_model:
 ## Main Message
-We present our initial results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.
-In our approach, we carefully selected the dense layers from Qwen2.5-0.5B to construct our model. Notably, while Qwen2.5-0.5B was trained on *18 trillion* tokens, our model was trained on only *5 billion* tokens—over three orders of magnitude fewer—yet it achieves comparable performance.
-**Note**: Please note that this model has not yet been instruction-tuned; instruction-tuning is an area of ongoing development.
 ## Evaluation Results
 ### Harness Evaluation
 - The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

 ## Main Message
+Here is the instruction-tuned version of the pretrained **Kiwi-1.0-0.7B** model. As can been seen in the table below, the results are at paar with the SOTA Qwen2.5-0.5B.
 ## Evaluation Results
 ### Harness Evaluation
 - The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).