rwmasood commited on
Commit
ac30751
·
verified ·
1 Parent(s): 3b41343

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -11
README.md CHANGED
@@ -12,30 +12,45 @@ base_model:
12
  ---
13
 
14
  # Kiwi-1.0-0.7B-32k-Instruct
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ### Harness Evaluation
17
 
18
  - The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
19
- The model is evaluated on three benchmark datasets, which include `ARC-Challenge`, `HellaSwag`, `MMLU` and `IFEval`.
20
  The library used is [lm-evaluation-harness repository](https://github.com/EleutherAI/lm-evaluation-harness)
21
 
22
 
23
- #### Main Results
24
- | Model | ARC | HellaSwag | MMLU-PRO | IFEval | GPQA (Diamond) |
25
- |------------------------|----------|--------|------|--------|--------|
26
- | **Qwen2.5-05B** | **33.45** | **52.37** | **14.03** | **37.53**| **12.27**(Zero-shot CoT)|
27
- | **Kiwi-1.0-0.7B-32k-Instruct** | **32.34** | **48.59** | **12.89** | **27.1** | **17.17**(Zero-shot CoT) |
28
-
29
-
30
- #### Main Results
31
  | Metric | **Qwen2.5-05B-Instruct** | **Kiwi-1.0-0.7B-32k-Instruct** |
32
  |-----------------|:-------------------:|:--------------------------:|
33
  | ARC | 33.45 | 32.34 |
34
  | HellaSwag | 52.37 | 48.59 |
35
  | MMLU-PRO | 14.03 | 12.89 |
36
  | IFEval | 37.53 | 27.1 |
37
- | GPQA (Diamond)\(Zero-shot CoT) | 12.27 | 17.17 |
38
- | Average | 29,93 | 27,27 |
39
 
40
 
41
 
 
12
  ---
13
 
14
  # Kiwi-1.0-0.7B-32k-Instruct
15
+ ## Instruction-Tuned Model
16
+
17
+ * **Developed by**: [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net)
18
+ * **Backbone Model**: [Kiwi-1.0-0.7B-32k](https://huggingface.co/empirischtech/Kiwi-1.0-0.7B-32k)
19
+ * **Parameters**: 700m
20
+ * **Context Window**: 32k
21
+ * **Language(s)**: English
22
+ * **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
23
+ * **License**: Creative Common Attribute 4.0 (CCA-4.0)
24
+ * **Contact**: For questions and comments about the model, please email [contact-us](https://chaperoneai.net/contact)
25
+
26
+ ## Main Message
27
+
28
+ We present our initial results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.
29
+
30
+ In our approach, we carefully selected the dense layers from Qwen2.5-0.5B to construct our model. Notably, while Qwen2.5-0.5B was trained on *18 trillion* tokens, our model was trained on only *5 billion* tokens—over three orders of magnitude fewer—yet it achieves comparable performance.
31
+
32
+ **Note**: Please note that this model has not yet been instruction-tuned; instruction-tuning is an area of ongoing development.
33
+
34
+ ## Evaluation Results
35
+
36
+
37
 
38
  ### Harness Evaluation
39
 
40
  - The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
41
+ The model is evaluated on three benchmark datasets, which include `ARC-Challenge`, `HellaSwag`, `MMLU`, `IFEval` and `GPQA`.
42
  The library used is [lm-evaluation-harness repository](https://github.com/EleutherAI/lm-evaluation-harness)
43
 
44
 
45
+ ### Main Results
 
 
 
 
 
 
 
46
  | Metric | **Qwen2.5-05B-Instruct** | **Kiwi-1.0-0.7B-32k-Instruct** |
47
  |-----------------|:-------------------:|:--------------------------:|
48
  | ARC | 33.45 | 32.34 |
49
  | HellaSwag | 52.37 | 48.59 |
50
  | MMLU-PRO | 14.03 | 12.89 |
51
  | IFEval | 37.53 | 27.1 |
52
+ | GPQA (Diamond)<br>(Zero-shot CoT) | 12.27 | 17.17 |
53
+ | **Average** | **29,93** | **27,27** |
54
 
55
 
56