Daemontatox commited on
Commit
06ede14
Β·
verified Β·
1 Parent(s): ee4dabb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -7
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- base_model: unsloth/magistral-small-2506-unsloth-bnb-4bit
3
  tags:
4
  - text-generation-inference
5
  - transformers
@@ -8,14 +8,137 @@ tags:
8
  license: apache-2.0
9
  language:
10
  - en
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
14
 
15
- - **Developed by:** Daemontatox
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/magistral-small-2506-unsloth-bnb-4bit
18
 
19
- This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: unsloth/magistral-small-2506
3
  tags:
4
  - text-generation-inference
5
  - transformers
 
8
  license: apache-2.0
9
  language:
10
  - en
11
+ library_name: transformers
12
  ---
13
 
14
+ ### highly experimental model , might not work as expected
15
+ # 🧠 Daemontatox/mini-overthinker
16
 
17
+ **A highly experimental attempt to fine-tune [Magistral (Mistral)](https://huggingface.co/unsloth/magistral-small-2506) for enhanced staged reasoning with self-reflective thinking patterns.**
 
 
18
 
19
+ ---
20
+
21
+ ## πŸ“Œ Summary
22
+
23
+ * **Base Model**: [`unsloth/magistral-small-2506`](https://huggingface.co/unsloth/magistral-small-2506)
24
+ * **Fine-tuned by**: `Daemontatox`
25
+ * **Model Name**: `Daemontatox/mini-overthinker`
26
+ * **License**: Apache 2.0
27
+ * **Language**: English
28
+ * **Status**: πŸ”¬ Experimental – *Not intended for production use.*
29
+
30
+ ---
31
+
32
+ ## ⚠️ Disclaimer
33
+
34
+ > This model is **not designed for production**. It is an **experimental prototype** to explore cognitive-loop-style reasoning with reflection. It may behave unpredictably, hallucinate, or fail to follow standard instruction formats. Use only for research and prototyping.
35
+
36
+ ---
37
+
38
+ ## 🧠 Motivation
39
+
40
+ This model was fine-tuned to:
41
+
42
+ * Think in **staged batches**.
43
+ * Insert **intermediate reasoning steps**.
44
+ * Pause to **self-reflect** on its own outputs.
45
+ * Encourage **Theory-of-Mind-like behavior** via structured thinking templates.
46
+
47
+ Inspired by the *SUPERTHINKER* design used in [`HelpingAI/Dhanishtha-2.0-SUPERTHINKER`](https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-SUPERTHINKER), this model attempts a similar multi-phase thought process in a lightweight setup.
48
+
49
+ > **Special thanks** to the creators of [`HelpingAI/Dhanishtha-2.0-SUPERTHINKER`](https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-SUPERTHINKER) for the dataset structure and inspiration behind this staged reasoning approach.
50
+
51
+ ---
52
+
53
+ ## πŸ§ͺ Example Prompt Structure
54
+
55
+ ```text
56
+ Q: What are the downsides of AI regulation?
57
+
58
+ Think Step 1:
59
+ <|THINK|> Regulation might slow innovation. It could also centralize power in large companies.
60
+
61
+ Answer Attempt 1:
62
+ <|ANSWER|> Slower innovation and reduced competition.
63
+
64
+ Reflection:
65
+ <|REFLECT|> The points are valid, but lack mention of potential misalignment with global norms.
66
+
67
+ Final Answer:
68
+ <|FINAL|> The main downsides are slower innovation, centralized control, and difficulty in harmonizing global frameworks.
69
+ ```
70
+
71
+ ---
72
+
73
+ ## πŸ”§ Inference Code (Transformers)
74
+
75
+ ```python
76
+ from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
77
+ import torch
78
 
79
+ model_id = "Daemontatox/mini-overthinker"
80
+
81
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
82
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
83
+
84
+ streamer = TextStreamer(tokenizer)
85
+
86
+ prompt = """Q: What is intelligence?
87
+
88
+ Think Step 1:
89
+ <|THINK|> Intelligence involves pattern recognition, abstraction, and reasoning.
90
+
91
+ Answer Attempt 1:
92
+ <|ANSWER|> The ability to reason, learn, and adapt.
93
+
94
+ Reflection:
95
+ <|REFLECT|> Lacks mention of creativity and problem-solving aspects.
96
+
97
+ Final Answer:
98
+ <|FINAL|> Intelligence is the ability to reason, learn, adapt, and solve problems creatively.
99
+ """
100
+
101
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
102
+ outputs = model.generate(**inputs, max_new_tokens=200, streamer=streamer)
103
+ ```
104
+
105
+ ---
106
+
107
+ ## 🚫 Limitations
108
+
109
+ * Requires **explicit token triggers** (`<|THINK|>`, `<|REFLECT|>`, etc.)
110
+ * May **hallucinate** or get stuck in loops.
111
+ * Behavior can degrade in **zero-shot** usage.
112
+ * Not benchmarked, **no alignment or safety tuning** applied.
113
+
114
+ ---
115
+
116
+ ## βœ… Intended For
117
+
118
+ * Research in **cognitive loops**
119
+ * LLM **agent architecture prototyping**
120
+ * Simulating **multi-phase reasoning**
121
+
122
+ ---
123
+
124
+ ## ❌ Not Recommended For
125
+
126
+ * Real-world deployment
127
+ * Safety-critical tasks
128
+ * Answer quality evaluation without verification
129
+
130
+ ---
131
+
132
+ ## πŸ“Ž Citation
133
+
134
+ ```
135
+ @misc{mini-overthinker2025,
136
+ author = {Daemontatox},
137
+ title = {Mini-Overthinker: Experimental Staged Reasoning Model},
138
+ year = {2025},
139
+ howpublished = {\url{https://huggingface.co/Daemontatox/mini-overthinker}},
140
+ note = {Fine-tuned from unsloth/magistral-small-2506 using ideas from HelpingAI/Dhanishtha-2.0-SUPERTHINKER}
141
+ }
142
+ ```
143
+
144
+ ---