Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.11367

about 2 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Generative Model

SelfEval: Leveraging the discriminative nature of generative models for evaluation

Paper • 2311.10708 • Published Nov 17, 2023 • 17
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 75
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17, 2024 • 32

video generation

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Paper • 2312.04483 • Published Dec 7, 2023 • 7
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 18
Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24
PEEKABOO: Interactive Video Generation via Masked-Diffusion

Paper • 2312.07509 • Published Dec 12, 2023 • 12

AI Math: Diffusion

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 66
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22, 2024 • 37
Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22, 2024 • 17
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 62

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 11
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

about 2 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

AI Math: Diffusion

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 66
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22, 2024 • 37
Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22, 2024 • 17
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 62

Generative Model

SelfEval: Leveraging the discriminative nature of generative models for evaluation

Paper • 2311.10708 • Published Nov 17, 2023 • 17
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 116
NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 75
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17, 2024 • 32

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 11
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

video generation

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Paper • 2312.04483 • Published Dec 7, 2023 • 7
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

Paper • 2312.03793 • Published Dec 6, 2023 • 18
Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 24
PEEKABOO: Interactive Video Generation via Masked-Diffusion

Paper • 2312.07509 • Published Dec 12, 2023 • 12

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs