Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings Paper • 2506.04997 • Published 4 days ago
Video World Models with Long-term Spatial Memory Paper • 2506.05284 • Published 3 days ago • 42
VideoRoPE: What Makes for Good Video Rotary Position Embeddi Collection A storage repo for VideoRoPE. • 5 items • Updated Apr 24 • 3
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning Paper • 2505.14677 • Published 19 days ago • 15
view article Article Introducing Pivotal Token Search (PTS): Targeting Critical Decision Points in LLM Training By codelion • 23 days ago • 5
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Paper • 2406.13542 • Published Jun 19, 2024 • 17
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6 • 92