Abstract
Large reasoning models exhibit multilingual latent reasoning capabilities with varying strength across languages and benchmarks, showing consistent internal prediction evolution despite surface-level differences.
Large reasoning models (LRMs) achieve strong performance on mathematical reasoning tasks, often attributed to their capability to generate explicit chain-of-thought (CoT) explanations. However, recent work shows that LRMs often arrive at the correct answer before completing these textual reasoning steps, indicating the presence of latent reasoning -- internal, non-verbal computation encoded in hidden states. While this phenomenon has been explored in English, its multilingual behavior remains largely unknown. In this paper, we conduct a systematic investigation of multilingual latent reasoning in LRMs across 11 languages. Using a truncation-based strategy, we examine how the correct answer emerges as the model is given only partial reasoning traces, allowing us to measure stepwise latent prediction formation. Our results reveal clear evidence of multilingual latent reasoning, though unevenly: strong in resource-rich languages, weaker in low-resource ones, and broadly less observable on harder benchmarks. To understand whether these differences reflect distinct internal mechanisms, we further perform representational analyses. Despite surface-level disparities, we find that the internal evolution of predictions is highly consistent across languages and broadly aligns with English -- a pattern suggesting an English-centered latent reasoning pathway.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering (2025)
- Beg to Differ: Understanding Reasoning-Answer Misalignment Across Languages (2025)
- Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought (2025)
- What Really Counts? Examining Step and Token Level Attribution in Multilingual CoT Reasoning (2025)
- Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time (2025)
- Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process (2025)
- Reasoning Relay: Evaluating Stability and Interchangeability of Large Language Models in Mathematical Reasoning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper