Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning Paper • 2507.06485 • Published Jul 9 • 4
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation Paper • 2506.17113 • Published Jun 20 • 4
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Paper • 2506.07177 • Published Jun 8 • 22
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning Paper • 2506.03525 • Published Jun 4 • 6
EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance Paper • 2505.21876 • Published May 28 • 9
Distilling LLM Agent into Small Models with Retrieval and Code Tools Paper • 2505.17612 • Published May 23 • 81
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs Paper • 2503.01820 • Published Mar 3 • 2
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning Paper • 2502.15082 • Published Feb 20 • 1
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model Paper • 2502.13449 • Published Feb 19 • 46
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 146
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 46
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation Paper • 2411.16657 • Published Nov 25, 2024 • 20
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Paper • 2411.15115 • Published Nov 22, 2024 • 9
RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives Paper • 2405.18406 • Published May 28, 2024 • 1