LaViDa: A Large Diffusion Language Model for Multimodal Understanding Paper • 2505.16839 • Published 2 days ago • 10
Training-Free Efficient Video Generation via Dynamic Token Carving Paper • 2505.16864 • Published 2 days ago • 13
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Paper • 2505.16933 • Published 2 days ago • 23
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Paper • 2505.16175 • Published 3 days ago • 34
X-Fusion: Introducing New Modality to Frozen Large Language Models Paper • 2504.20996 • Published 25 days ago • 12
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published 10 days ago • 82
Faster Video Diffusion with Trainable Sparse Attention Paper • 2505.13389 • Published 5 days ago • 34
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction Paper • 2505.11254 • Published 9 days ago • 47
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published Mar 10 • 69
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 159