DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation Paper • 2503.06053 • Published Mar 8 • 138
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published Mar 26 • 54
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding Paper • 2503.02951 • Published Mar 4 • 32
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Paper • 2503.10437 • Published Mar 13 • 33
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published Mar 3 • 32
φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper • 2503.13288 • Published Mar 17 • 52
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published Mar 13 • 52
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published Mar 5 • 233
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper • 2501.14677 • Published Jan 24 • 35
FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading Paper • 2502.11433 • Published Feb 17 • 36
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper • 2502.14397 • Published Feb 20 • 42
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published Feb 18 • 42
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation Paper • 2502.08690 • Published Feb 12 • 44
Phantom: Subject-consistent video generation via cross-modal alignment Paper • 2502.11079 • Published Feb 16 • 60
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 195
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 241