|
--- |
|
base_model: unsloth/magistral-small-2506 |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- mistral |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
--- |
|
|
|
### highly experimental model , might not work as expected |
|
# π§ Daemontatox/mini-overthinker |
|
|
|
**A highly experimental attempt to fine-tune [Magistral (Mistral)](https://huggingface.co/unsloth/magistral-small-2506) for enhanced staged reasoning with self-reflective thinking patterns.** |
|
|
|
--- |
|
|
|
## π Summary |
|
|
|
* **Base Model**: [`unsloth/magistral-small-2506`](https://huggingface.co/unsloth/magistral-small-2506) |
|
* **Fine-tuned by**: `Daemontatox` |
|
* **Model Name**: `Daemontatox/mini-overthinker` |
|
* **License**: Apache 2.0 |
|
* **Language**: English |
|
* **Status**: π¬ Experimental β *Not intended for production use.* |
|
|
|
--- |
|
|
|
## β οΈ Disclaimer |
|
|
|
> This model is **not designed for production**. It is an **experimental prototype** to explore cognitive-loop-style reasoning with reflection. It may behave unpredictably, hallucinate, or fail to follow standard instruction formats. Use only for research and prototyping. |
|
|
|
--- |
|
|
|
## π§ Motivation |
|
|
|
This model was fine-tuned to: |
|
|
|
* Think in **staged batches**. |
|
* Insert **intermediate reasoning steps**. |
|
* Pause to **self-reflect** on its own outputs. |
|
* Encourage **Theory-of-Mind-like behavior** via structured thinking templates. |
|
|
|
Inspired by the *SUPERTHINKER* design used in [`HelpingAI/Dhanishtha-2.0-SUPERTHINKER`](https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-SUPERTHINKER), this model attempts a similar multi-phase thought process in a lightweight setup. |
|
|
|
> **Special thanks** to the creators of [`HelpingAI/Dhanishtha-2.0-SUPERTHINKER`](https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-SUPERTHINKER) for the dataset structure and inspiration behind this staged reasoning approach. |
|
|
|
--- |
|
|
|
## π§ͺ Example Prompt Structure |
|
|
|
```text |
|
Q: What are the downsides of AI regulation? |
|
|
|
Think Step 1: |
|
<|THINK|> Regulation might slow innovation. It could also centralize power in large companies. |
|
|
|
Answer Attempt 1: |
|
<|ANSWER|> Slower innovation and reduced competition. |
|
|
|
Reflection: |
|
<|REFLECT|> The points are valid, but lack mention of potential misalignment with global norms. |
|
|
|
Final Answer: |
|
<|FINAL|> The main downsides are slower innovation, centralized control, and difficulty in harmonizing global frameworks. |
|
``` |
|
|
|
--- |
|
|
|
## π§ Inference Code (Transformers) |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer |
|
import torch |
|
|
|
model_id = "Daemontatox/mini-overthinker" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto") |
|
|
|
streamer = TextStreamer(tokenizer) |
|
|
|
prompt = """Q: What is intelligence? |
|
|
|
Think Step 1: |
|
<|THINK|> Intelligence involves pattern recognition, abstraction, and reasoning. |
|
|
|
Answer Attempt 1: |
|
<|ANSWER|> The ability to reason, learn, and adapt. |
|
|
|
Reflection: |
|
<|REFLECT|> Lacks mention of creativity and problem-solving aspects. |
|
|
|
Final Answer: |
|
<|FINAL|> Intelligence is the ability to reason, learn, adapt, and solve problems creatively. |
|
""" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
outputs = model.generate(**inputs, max_new_tokens=200, streamer=streamer) |
|
``` |
|
|
|
--- |
|
|
|
## π« Limitations |
|
|
|
* Requires **explicit token triggers** (`<|THINK|>`, `<|REFLECT|>`, etc.) |
|
* May **hallucinate** or get stuck in loops. |
|
* Behavior can degrade in **zero-shot** usage. |
|
* Not benchmarked, **no alignment or safety tuning** applied. |
|
|
|
--- |
|
|
|
## β
Intended For |
|
|
|
* Research in **cognitive loops** |
|
* LLM **agent architecture prototyping** |
|
* Simulating **multi-phase reasoning** |
|
|
|
--- |
|
|
|
## β Not Recommended For |
|
|
|
* Real-world deployment |
|
* Safety-critical tasks |
|
* Answer quality evaluation without verification |
|
|
|
--- |
|
|
|
## π Citation |
|
|
|
``` |
|
@misc{mini-overthinker2025, |
|
author = {Daemontatox}, |
|
title = {Mini-Overthinker: Experimental Staged Reasoning Model}, |
|
year = {2025}, |
|
howpublished = {\url{https://huggingface.co/Daemontatox/mini-overthinker}}, |
|
note = {Fine-tuned from unsloth/magistral-small-2506 using ideas from HelpingAI/Dhanishtha-2.0-SUPERTHINKER} |
|
} |
|
``` |
|
|
|
--- |
|
|