--- base_model: unsloth/magistral-small-2506 tags: - text-generation-inference - transformers - unsloth - mistral license: apache-2.0 language: - en library_name: transformers --- ### highly experimental model , might not work as expected # ๐Ÿง  Daemontatox/mini-overthinker **A highly experimental attempt to fine-tune [Magistral (Mistral)](https://huggingface.co/unsloth/magistral-small-2506) for enhanced staged reasoning with self-reflective thinking patterns.** --- ## ๐Ÿ“Œ Summary * **Base Model**: [`unsloth/magistral-small-2506`](https://huggingface.co/unsloth/magistral-small-2506) * **Fine-tuned by**: `Daemontatox` * **Model Name**: `Daemontatox/mini-overthinker` * **License**: Apache 2.0 * **Language**: English * **Status**: ๐Ÿ”ฌ Experimental โ€“ *Not intended for production use.* --- ## โš ๏ธ Disclaimer > This model is **not designed for production**. It is an **experimental prototype** to explore cognitive-loop-style reasoning with reflection. It may behave unpredictably, hallucinate, or fail to follow standard instruction formats. Use only for research and prototyping. --- ## ๐Ÿง  Motivation This model was fine-tuned to: * Think in **staged batches**. * Insert **intermediate reasoning steps**. * Pause to **self-reflect** on its own outputs. * Encourage **Theory-of-Mind-like behavior** via structured thinking templates. Inspired by the *SUPERTHINKER* design used in [`HelpingAI/Dhanishtha-2.0-SUPERTHINKER`](https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-SUPERTHINKER), this model attempts a similar multi-phase thought process in a lightweight setup. > **Special thanks** to the creators of [`HelpingAI/Dhanishtha-2.0-SUPERTHINKER`](https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-SUPERTHINKER) for the dataset structure and inspiration behind this staged reasoning approach. --- ## ๐Ÿงช Example Prompt Structure ```text Q: What are the downsides of AI regulation? Think Step 1: <|THINK|> Regulation might slow innovation. It could also centralize power in large companies. Answer Attempt 1: <|ANSWER|> Slower innovation and reduced competition. Reflection: <|REFLECT|> The points are valid, but lack mention of potential misalignment with global norms. Final Answer: <|FINAL|> The main downsides are slower innovation, centralized control, and difficulty in harmonizing global frameworks. ``` --- ## ๐Ÿ”ง Inference Code (Transformers) ```python from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer import torch model_id = "Daemontatox/mini-overthinker" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto") streamer = TextStreamer(tokenizer) prompt = """Q: What is intelligence? Think Step 1: <|THINK|> Intelligence involves pattern recognition, abstraction, and reasoning. Answer Attempt 1: <|ANSWER|> The ability to reason, learn, and adapt. Reflection: <|REFLECT|> Lacks mention of creativity and problem-solving aspects. Final Answer: <|FINAL|> Intelligence is the ability to reason, learn, adapt, and solve problems creatively. """ inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=200, streamer=streamer) ``` --- ## ๐Ÿšซ Limitations * Requires **explicit token triggers** (`<|THINK|>`, `<|REFLECT|>`, etc.) * May **hallucinate** or get stuck in loops. * Behavior can degrade in **zero-shot** usage. * Not benchmarked, **no alignment or safety tuning** applied. --- ## โœ… Intended For * Research in **cognitive loops** * LLM **agent architecture prototyping** * Simulating **multi-phase reasoning** --- ## โŒ Not Recommended For * Real-world deployment * Safety-critical tasks * Answer quality evaluation without verification --- ## ๐Ÿ“Ž Citation ``` @misc{mini-overthinker2025, author = {Daemontatox}, title = {Mini-Overthinker: Experimental Staged Reasoning Model}, year = {2025}, howpublished = {\url{https://huggingface.co/Daemontatox/mini-overthinker}}, note = {Fine-tuned from unsloth/magistral-small-2506 using ideas from HelpingAI/Dhanishtha-2.0-SUPERTHINKER} } ``` ---