HoliTom: Holistic Token Merging for Fast Video Large Language Models Paper โข 2505.21334 โข Published 30 days ago โข 19
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper โข 2505.09568 โข Published May 14 โข 94
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs Paper โข 2504.17040 โข Published Apr 23 โข 13
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Paper โข 2503.16257 โข Published Mar 20 โข 24
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper โข 2503.16419 โข Published Mar 20 โข 75
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper โข 2502.05171 โข Published Feb 7 โข 142
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper โข 2502.02492 โข Published Feb 4 โข 65
Unifying Specialized Visual Encoders for Video Language Models Paper โข 2501.01426 โข Published Jan 2 โข 21
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper โข 2407.01370 โข Published Jul 1, 2024 โข 90