Post
1465
DeepThink Plugin: Bringing Gemini 2.5's Parallel Reasoning to Open Models
Just released an open-source plugin that implements Google's "Deep Think" reasoning approach for models like DeepSeek R1, Qwen3, and other open models.
Google's recent Gemini 2.5 report introduced Deep Think - a technique where models generate multiple hypotheses in parallel and critique them before arriving at final answers. It achieves SOTA results on math olympiads and competitive coding benchmarks.
Our implementation works by modifying the inference pipeline to explore multiple solution paths simultaneously, then synthesizing the best approach. Instead of single-pass generation, models run an internal debate before responding.
Key features:
- Works with any model that supports structured reasoning patterns
- Implements parallel thinking during response generation
- Particularly effective for complex reasoning tasks, math, and coding problems
- Increases inference time but significantly improves answer quality
The plugin won the Cerebras & OpenRouter Qwen 3 Hackathon, validating that this approach works well beyond Google's proprietary implementation.
GitHub: https://github.com/codelion/optillm/tree/main/optillm/plugins/deepthink
Demo: https://www.youtube.com/watch?v=b06kD1oWBA4
The goal is democratizing advanced reasoning capabilities that were previously locked behind APIs. Perfect for researchers and practitioners working with local deployments who want enhanced reasoning without dependency on proprietary services.
Performance notes: Currently about 2-3x slower inference but much better results on complex problems. Working on adaptive triggering to only activate when problems benefit from parallel reasoning.
Would love feedback from the HF community and collaborations on optimizing the approach further. Open to PRs and always interested in making open models more capable.
Just released an open-source plugin that implements Google's "Deep Think" reasoning approach for models like DeepSeek R1, Qwen3, and other open models.
Google's recent Gemini 2.5 report introduced Deep Think - a technique where models generate multiple hypotheses in parallel and critique them before arriving at final answers. It achieves SOTA results on math olympiads and competitive coding benchmarks.
Our implementation works by modifying the inference pipeline to explore multiple solution paths simultaneously, then synthesizing the best approach. Instead of single-pass generation, models run an internal debate before responding.
Key features:
- Works with any model that supports structured reasoning patterns
- Implements parallel thinking during response generation
- Particularly effective for complex reasoning tasks, math, and coding problems
- Increases inference time but significantly improves answer quality
The plugin won the Cerebras & OpenRouter Qwen 3 Hackathon, validating that this approach works well beyond Google's proprietary implementation.
GitHub: https://github.com/codelion/optillm/tree/main/optillm/plugins/deepthink
Demo: https://www.youtube.com/watch?v=b06kD1oWBA4
The goal is democratizing advanced reasoning capabilities that were previously locked behind APIs. Perfect for researchers and practitioners working with local deployments who want enhanced reasoning without dependency on proprietary services.
Performance notes: Currently about 2-3x slower inference but much better results on complex problems. Working on adaptive triggering to only activate when problems benefit from parallel reasoning.
Would love feedback from the HF community and collaborations on optimizing the approach further. Open to PRs and always interested in making open models more capable.