Update README.md
Browse files
README.md
CHANGED
@@ -17,6 +17,8 @@ tags:
|
|
17 |
- code
|
18 |
---
|
19 |
|
|
|
|
|
20 |
# Crux-Qwen3\_OpenThinking-4B
|
21 |
|
22 |
> **Crux-Qwen3\_OpenThinking-4B** is fine-tuned on the **Qwen3-4B** architecture, optimized for advanced **open thinking**, **mathematical reasoning**, and **logical problem solving**. This model is trained on the traces of **sk1.1**, which include 1,000 entries from the **Gemini thinking trajectory**, combined with fine-tuning on 100k tokens of **open math reasoning** data. This makes it highly effective for nuanced reasoning, educational tasks, and complex problem-solving requiring clear thought processes.
|
@@ -106,4 +108,4 @@ print(response)
|
|
106 |
|
107 |
## References
|
108 |
|
109 |
-
1. [YaRN: Efficient Context Window Extension of Large Language Models](https://arxiv.org/pdf/2309.00071)
|
|
|
17 |
- code
|
18 |
---
|
19 |
|
20 |
+

|
21 |
+
|
22 |
# Crux-Qwen3\_OpenThinking-4B
|
23 |
|
24 |
> **Crux-Qwen3\_OpenThinking-4B** is fine-tuned on the **Qwen3-4B** architecture, optimized for advanced **open thinking**, **mathematical reasoning**, and **logical problem solving**. This model is trained on the traces of **sk1.1**, which include 1,000 entries from the **Gemini thinking trajectory**, combined with fine-tuning on 100k tokens of **open math reasoning** data. This makes it highly effective for nuanced reasoning, educational tasks, and complex problem-solving requiring clear thought processes.
|
|
|
108 |
|
109 |
## References
|
110 |
|
111 |
+
1. [YaRN: Efficient Context Window Extension of Large Language Models](https://arxiv.org/pdf/2309.00071)
|