Update README.md
Browse files
README.md
CHANGED
|
@@ -12,30 +12,45 @@ base_model:
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# Kiwi-1.0-0.7B-32k-Instruct
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
### Harness Evaluation
|
| 17 |
|
| 18 |
- The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
| 19 |
-
The model is evaluated on three benchmark datasets, which include `ARC-Challenge`, `HellaSwag`, `MMLU` and `
|
| 20 |
The library used is [lm-evaluation-harness repository](https://github.com/EleutherAI/lm-evaluation-harness)
|
| 21 |
|
| 22 |
|
| 23 |
-
|
| 24 |
-
| Model | ARC | HellaSwag | MMLU-PRO | IFEval | GPQA (Diamond) |
|
| 25 |
-
|------------------------|----------|--------|------|--------|--------|
|
| 26 |
-
| **Qwen2.5-05B** | **33.45** | **52.37** | **14.03** | **37.53**| **12.27**(Zero-shot CoT)|
|
| 27 |
-
| **Kiwi-1.0-0.7B-32k-Instruct** | **32.34** | **48.59** | **12.89** | **27.1** | **17.17**(Zero-shot CoT) |
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
#### Main Results
|
| 31 |
| Metric | **Qwen2.5-05B-Instruct** | **Kiwi-1.0-0.7B-32k-Instruct** |
|
| 32 |
|-----------------|:-------------------:|:--------------------------:|
|
| 33 |
| ARC | 33.45 | 32.34 |
|
| 34 |
| HellaSwag | 52.37 | 48.59 |
|
| 35 |
| MMLU-PRO | 14.03 | 12.89 |
|
| 36 |
| IFEval | 37.53 | 27.1 |
|
| 37 |
-
| GPQA (Diamond)
|
| 38 |
-
| Average | 29,93 | 27,27 |
|
| 39 |
|
| 40 |
|
| 41 |
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# Kiwi-1.0-0.7B-32k-Instruct
|
| 15 |
+
## Instruction-Tuned Model
|
| 16 |
+
|
| 17 |
+
* **Developed by**: [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net)
|
| 18 |
+
* **Backbone Model**: [Kiwi-1.0-0.7B-32k](https://huggingface.co/empirischtech/Kiwi-1.0-0.7B-32k)
|
| 19 |
+
* **Parameters**: 700m
|
| 20 |
+
* **Context Window**: 32k
|
| 21 |
+
* **Language(s)**: English
|
| 22 |
+
* **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
|
| 23 |
+
* **License**: Creative Common Attribute 4.0 (CCA-4.0)
|
| 24 |
+
* **Contact**: For questions and comments about the model, please email [contact-us](https://chaperoneai.net/contact)
|
| 25 |
+
|
| 26 |
+
## Main Message
|
| 27 |
+
|
| 28 |
+
We present our initial results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.
|
| 29 |
+
|
| 30 |
+
In our approach, we carefully selected the dense layers from Qwen2.5-0.5B to construct our model. Notably, while Qwen2.5-0.5B was trained on *18 trillion* tokens, our model was trained on only *5 billion* tokens—over three orders of magnitude fewer—yet it achieves comparable performance.
|
| 31 |
+
|
| 32 |
+
**Note**: Please note that this model has not yet been instruction-tuned; instruction-tuning is an area of ongoing development.
|
| 33 |
+
|
| 34 |
+
## Evaluation Results
|
| 35 |
+
|
| 36 |
+
|
| 37 |
|
| 38 |
### Harness Evaluation
|
| 39 |
|
| 40 |
- The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
| 41 |
+
The model is evaluated on three benchmark datasets, which include `ARC-Challenge`, `HellaSwag`, `MMLU`, `IFEval` and `GPQA`.
|
| 42 |
The library used is [lm-evaluation-harness repository](https://github.com/EleutherAI/lm-evaluation-harness)
|
| 43 |
|
| 44 |
|
| 45 |
+
### Main Results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
| Metric | **Qwen2.5-05B-Instruct** | **Kiwi-1.0-0.7B-32k-Instruct** |
|
| 47 |
|-----------------|:-------------------:|:--------------------------:|
|
| 48 |
| ARC | 33.45 | 32.34 |
|
| 49 |
| HellaSwag | 52.37 | 48.59 |
|
| 50 |
| MMLU-PRO | 14.03 | 12.89 |
|
| 51 |
| IFEval | 37.53 | 27.1 |
|
| 52 |
+
| GPQA (Diamond)<br>(Zero-shot CoT) | 12.27 | 17.17 |
|
| 53 |
+
| **Average** | **29,93** | **27,27** |
|
| 54 |
|
| 55 |
|
| 56 |
|