Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 56
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization Paper • 2408.02555 • Published Aug 5 • 28
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning Paper • 2407.20798 • Published Jul 30 • 23
Longhorn: State Space Models are Amortized Online Learners Paper • 2407.14207 • Published Jul 19 • 16
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets Paper • 2406.13897 • Published May 30 • 12
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis Paper • 2407.09732 • Published Jul 13 • 8
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper • 2407.06135 • Published Jul 8 • 20
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Paper • 2407.01392 • Published Jul 1 • 39
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 92
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds Paper • 2407.01494 • Published Jul 1 • 13
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Paper • 2407.00088 • Published Jun 25 • 10
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network Paper • 2406.18284 • Published Jun 26 • 19
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Paper • 2406.18009 • Published Jun 26 • 19
Wavelets Are All You Need for Autoregressive Image Generation Paper • 2406.19997 • Published Jun 28 • 29
T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings Paper • 2406.19223 • Published Jun 27 • 8