Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 26 days ago • 448
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 9 items • Updated 9 days ago • 117
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 155
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15, 2024 • 31
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 44
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 184
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision Paper • 2312.16256 • Published Dec 26, 2023 • 17
PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar Paper • 2312.14239 • Published Dec 21, 2023 • 12
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Paper • 2312.09911 • Published Dec 15, 2023 • 55