-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2409.11367
-
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Paper • 2311.10708 • Published • 17 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 116 -
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 75 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32
-
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Paper • 2312.04483 • Published • 7 -
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper • 2312.03793 • Published • 18 -
Photorealistic Video Generation with Diffusion Models
Paper • 2312.06662 • Published • 24 -
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Paper • 2312.07509 • Published • 12
-
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 66 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 37 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper • 2408.12588 • Published • 17 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 62
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 11 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 66 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 37 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper • 2408.12588 • Published • 17 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 62
-
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Paper • 2311.10708 • Published • 17 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 116 -
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 75 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 11 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Paper • 2312.04483 • Published • 7 -
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper • 2312.03793 • Published • 18 -
Photorealistic Video Generation with Diffusion Models
Paper • 2312.06662 • Published • 24 -
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Paper • 2312.07509 • Published • 12