PaTH Attention: Position Encoding via Accumulating Householder Transformations Paper • 2505.16381 • Published May 22
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation Paper • 2507.06607 • Published Jul 9 • 10
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation Paper • 2507.06607 • Published Jul 9 • 10
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation Paper • 2507.06607 • Published Jul 9 • 10 • 1
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 56
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30 • 48
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30 • 48