Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published Jul 8 • 25
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 13 days ago • 75
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 13 days ago • 131
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 22
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters Paper • 2412.00174 • Published 27 days ago • 22
VEnhancer: Generative Space-Time Enhancement for Video Generation Paper • 2407.07667 • Published Jul 10 • 14
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper • 2411.13503 • Published Nov 20 • 30
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published Nov 11 • 28
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published Nov 7 • 37
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4 • 33
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published Oct 25 • 23
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Paper • 2410.10629 • Published Oct 14 • 8
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2 • 41
Colorful Diffuse Intrinsic Image Decomposition in the Wild Paper • 2409.13690 • Published Sep 20 • 12
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos Paper • 2312.10300 • Published Dec 16, 2023 • 1
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 60
view article Article Optimum-NVIDIA - Unlock blazingly fast LLM inference in just 1 line of code Dec 5, 2023 • 4
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts Paper • 2408.03209 • Published Aug 6 • 21