DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought Paper • 2412.17498 • Published 3 days ago • 15
The Open Source Advantage in Large Language Models (LLMs) Paper • 2412.12004 • Published 10 days ago • 9
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 12 days ago • 131
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published 13 days ago • 25
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper • 2412.08443 • Published 15 days ago • 38
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published 19 days ago • 121
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published 20 days ago • 104