SANA-Sprint Collection 🏃SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation • 6 items • Updated 8 days ago • 32
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 6 days ago • 155
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity Paper • 2503.07677 • Published Mar 10 • 82
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published about 1 month ago • 132
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Paper • 2502.19400 • Published Feb 26 • 48
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Paper • 2412.10302 • Published Dec 13, 2024 • 18
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion Paper • 2409.17145 • Published Sep 25, 2024 • 15
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine Paper • 2408.02900 • Published Aug 6, 2024 • 29
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion Paper • 2408.03178 • Published Aug 6, 2024 • 41
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Apr 22, 2024 • 80
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2, 2024 • 57
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29, 2024 • 122