Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published 6 days ago • 90
HoliTom: Holistic Token Merging for Fast Video Large Language Models Paper • 2505.21334 • Published 6 days ago • 18
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published Sep 11, 2024 • 15
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Paper • 2505.12448 • Published 15 days ago • 10
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Paper • 2505.12448 • Published 15 days ago • 10
SSR Collection Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning • 5 items • Updated 9 days ago • 1
SSR Collection Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning • 5 items • Updated 9 days ago • 1
SSR Collection Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning • 5 items • Updated 9 days ago • 1
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published 21 days ago • 77
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published Sep 11, 2024 • 15
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated 28 days ago • 227