Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity
Abstract
A concurrent reasoning paradigm using a single LLM, called Group Think, enhances reasoning quality and reduces latency by allowing multiple reasoning threads to dynamically collaborate at the token level.
Recent advances in large language models (LLMs) have demonstrated the power of reasoning through self-generated chains of thought. Multiple reasoning agents can collaborate to raise joint reasoning quality above individual outcomes. However, such agents typically interact in a turn-based manner, trading increased latency for improved quality. In this paper, we propose Group Think--a single LLM that acts as multiple concurrent reasoning agents, or thinkers. With shared visibility into each other's partial generation progress, Group Think introduces a new concurrent-reasoning paradigm in which multiple reasoning trajectories adapt dynamically to one another at the token level. For example, a reasoning thread may shift its generation mid-sentence upon detecting that another thread is better positioned to continue. This fine-grained, token-level collaboration enables Group Think to reduce redundant reasoning and improve quality while achieving significantly lower latency. Moreover, its concurrent nature allows for efficient utilization of idle computational resources, making it especially suitable for edge inference, where very small batch size often underutilizes local~GPUs. We give a simple and generalizable modification that enables any existing LLM to perform Group Think on a local GPU. We also present an evaluation strategy to benchmark reasoning latency and empirically demonstrate latency improvements using open-source LLMs that were not explicitly trained for Group Think. We hope this work paves the way for future LLMs to exhibit more sophisticated and more efficient collaborative behavior for higher quality generation.
Community
We propose Group Think: a single LLM simulates multiple concurrent reasoning agents that collaborate at the token level by dynamically adapting to each other’s progress, leading to improved performance.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Hogwild! Inference: Parallel LLM Generation via Concurrent Attention (2025)
- Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning (2025)
- Learning Adaptive Parallel Reasoning with Language Models (2025)
- A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond (2025)
- Hawkeye:Efficient Reasoning with Model Collaboration (2025)
- Efficient Reasoning Models: A Survey (2025)
- A Survey of Scaling in Large Language Model Reasoning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper