Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper • 2506.17218 • Published 9 days ago • 19
view article Article Gemma 3n fully available in the open-source ecosystem! By ariG23498 and 7 others • 4 days ago • 87
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets Paper • 2506.14761 • Published 12 days ago • 13
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Paper • 2506.11886 • Published 17 days ago • 20
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation Paper • 2506.07530 • Published 21 days ago • 18
BitVLA Collection 1-bit Vision-Language-Action Models for Robotics Manipulation • 4 items • Updated 20 days ago • 2
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks Paper • 2506.00411 • Published 30 days ago • 30
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published about 1 month ago • 132
Large Language Models are Locally Linear Mappings Paper • 2505.24293 • Published about 1 month ago • 15
RLVR-World: Training World Models with Reinforcement Learning Paper • 2505.13934 • Published May 20 • 14
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via texttt{D}ual-texttt{H}ead texttt{O}ptimization Paper • 2505.07675 • Published May 12 • 19
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Paper • 2505.10554 • Published May 15 • 119
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12 • 80