Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
base_model: unsloth/magistral-small-2506
|
3 |
tags:
|
4 |
- text-generation-inference
|
5 |
- transformers
|
@@ -8,14 +8,137 @@ tags:
|
|
8 |
license: apache-2.0
|
9 |
language:
|
10 |
- en
|
|
|
11 |
---
|
12 |
|
13 |
-
|
|
|
14 |
|
15 |
-
-
|
16 |
-
- **License:** apache-2.0
|
17 |
-
- **Finetuned from model :** unsloth/magistral-small-2506-unsloth-bnb-4bit
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
base_model: unsloth/magistral-small-2506
|
3 |
tags:
|
4 |
- text-generation-inference
|
5 |
- transformers
|
|
|
8 |
license: apache-2.0
|
9 |
language:
|
10 |
- en
|
11 |
+
library_name: transformers
|
12 |
---
|
13 |
|
14 |
+
### highly experimental model , might not work as expected
|
15 |
+
# π§ Daemontatox/mini-overthinker
|
16 |
|
17 |
+
**A highly experimental attempt to fine-tune [Magistral (Mistral)](https://huggingface.co/unsloth/magistral-small-2506) for enhanced staged reasoning with self-reflective thinking patterns.**
|
|
|
|
|
18 |
|
19 |
+
---
|
20 |
+
|
21 |
+
## π Summary
|
22 |
+
|
23 |
+
* **Base Model**: [`unsloth/magistral-small-2506`](https://huggingface.co/unsloth/magistral-small-2506)
|
24 |
+
* **Fine-tuned by**: `Daemontatox`
|
25 |
+
* **Model Name**: `Daemontatox/mini-overthinker`
|
26 |
+
* **License**: Apache 2.0
|
27 |
+
* **Language**: English
|
28 |
+
* **Status**: π¬ Experimental β *Not intended for production use.*
|
29 |
+
|
30 |
+
---
|
31 |
+
|
32 |
+
## β οΈ Disclaimer
|
33 |
+
|
34 |
+
> This model is **not designed for production**. It is an **experimental prototype** to explore cognitive-loop-style reasoning with reflection. It may behave unpredictably, hallucinate, or fail to follow standard instruction formats. Use only for research and prototyping.
|
35 |
+
|
36 |
+
---
|
37 |
+
|
38 |
+
## π§ Motivation
|
39 |
+
|
40 |
+
This model was fine-tuned to:
|
41 |
+
|
42 |
+
* Think in **staged batches**.
|
43 |
+
* Insert **intermediate reasoning steps**.
|
44 |
+
* Pause to **self-reflect** on its own outputs.
|
45 |
+
* Encourage **Theory-of-Mind-like behavior** via structured thinking templates.
|
46 |
+
|
47 |
+
Inspired by the *SUPERTHINKER* design used in [`HelpingAI/Dhanishtha-2.0-SUPERTHINKER`](https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-SUPERTHINKER), this model attempts a similar multi-phase thought process in a lightweight setup.
|
48 |
+
|
49 |
+
> **Special thanks** to the creators of [`HelpingAI/Dhanishtha-2.0-SUPERTHINKER`](https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-SUPERTHINKER) for the dataset structure and inspiration behind this staged reasoning approach.
|
50 |
+
|
51 |
+
---
|
52 |
+
|
53 |
+
## π§ͺ Example Prompt Structure
|
54 |
+
|
55 |
+
```text
|
56 |
+
Q: What are the downsides of AI regulation?
|
57 |
+
|
58 |
+
Think Step 1:
|
59 |
+
<|THINK|> Regulation might slow innovation. It could also centralize power in large companies.
|
60 |
+
|
61 |
+
Answer Attempt 1:
|
62 |
+
<|ANSWER|> Slower innovation and reduced competition.
|
63 |
+
|
64 |
+
Reflection:
|
65 |
+
<|REFLECT|> The points are valid, but lack mention of potential misalignment with global norms.
|
66 |
+
|
67 |
+
Final Answer:
|
68 |
+
<|FINAL|> The main downsides are slower innovation, centralized control, and difficulty in harmonizing global frameworks.
|
69 |
+
```
|
70 |
+
|
71 |
+
---
|
72 |
+
|
73 |
+
## π§ Inference Code (Transformers)
|
74 |
+
|
75 |
+
```python
|
76 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
|
77 |
+
import torch
|
78 |
|
79 |
+
model_id = "Daemontatox/mini-overthinker"
|
80 |
+
|
81 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
82 |
+
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
|
83 |
+
|
84 |
+
streamer = TextStreamer(tokenizer)
|
85 |
+
|
86 |
+
prompt = """Q: What is intelligence?
|
87 |
+
|
88 |
+
Think Step 1:
|
89 |
+
<|THINK|> Intelligence involves pattern recognition, abstraction, and reasoning.
|
90 |
+
|
91 |
+
Answer Attempt 1:
|
92 |
+
<|ANSWER|> The ability to reason, learn, and adapt.
|
93 |
+
|
94 |
+
Reflection:
|
95 |
+
<|REFLECT|> Lacks mention of creativity and problem-solving aspects.
|
96 |
+
|
97 |
+
Final Answer:
|
98 |
+
<|FINAL|> Intelligence is the ability to reason, learn, adapt, and solve problems creatively.
|
99 |
+
"""
|
100 |
+
|
101 |
+
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
102 |
+
outputs = model.generate(**inputs, max_new_tokens=200, streamer=streamer)
|
103 |
+
```
|
104 |
+
|
105 |
+
---
|
106 |
+
|
107 |
+
## π« Limitations
|
108 |
+
|
109 |
+
* Requires **explicit token triggers** (`<|THINK|>`, `<|REFLECT|>`, etc.)
|
110 |
+
* May **hallucinate** or get stuck in loops.
|
111 |
+
* Behavior can degrade in **zero-shot** usage.
|
112 |
+
* Not benchmarked, **no alignment or safety tuning** applied.
|
113 |
+
|
114 |
+
---
|
115 |
+
|
116 |
+
## β
Intended For
|
117 |
+
|
118 |
+
* Research in **cognitive loops**
|
119 |
+
* LLM **agent architecture prototyping**
|
120 |
+
* Simulating **multi-phase reasoning**
|
121 |
+
|
122 |
+
---
|
123 |
+
|
124 |
+
## β Not Recommended For
|
125 |
+
|
126 |
+
* Real-world deployment
|
127 |
+
* Safety-critical tasks
|
128 |
+
* Answer quality evaluation without verification
|
129 |
+
|
130 |
+
---
|
131 |
+
|
132 |
+
## π Citation
|
133 |
+
|
134 |
+
```
|
135 |
+
@misc{mini-overthinker2025,
|
136 |
+
author = {Daemontatox},
|
137 |
+
title = {Mini-Overthinker: Experimental Staged Reasoning Model},
|
138 |
+
year = {2025},
|
139 |
+
howpublished = {\url{https://huggingface.co/Daemontatox/mini-overthinker}},
|
140 |
+
note = {Fine-tuned from unsloth/magistral-small-2506 using ideas from HelpingAI/Dhanishtha-2.0-SUPERTHINKER}
|
141 |
+
}
|
142 |
+
```
|
143 |
+
|
144 |
+
---
|