Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.07960

about 23 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

CoLLM: A Large Language Model for Composed Image Retrieval

Paper • 2503.19910 • Published 20 days ago • 11
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

Paper • 2503.21541 • Published 18 days ago • 1
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration

Paper • 2504.03536 • Published 10 days ago • 10
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Paper • 2504.04842 • Published 8 days ago • 28

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

Paper • 2503.18446 • Published 22 days ago • 9
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models

Paper • 2503.20240 • Published 20 days ago • 21
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

Paper • 2503.20672 • Published 19 days ago • 13
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models

Paper • 2503.20198 • Published 20 days ago • 4

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14 • 132
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

Paper • 2503.12885 • Published 29 days ago • 43
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Paper • 2503.17032 • Published 25 days ago • 24
Single Image Iterative Subject-driven Generation and Editing

Paper • 2503.16025 • Published 26 days ago • 13

Diffusion Model Control

Control Methods for Diffusion and Score Models

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Paper • 2412.09622 • Published Dec 12, 2024 • 8
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

Paper • 2412.04146 • Published Dec 5, 2024 • 23
Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 37
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Paper • 2412.05148 • Published Dec 6, 2024 • 12

Image Generation

Image Generation

about 13 hours ago

Causal Diffusion Transformers for Generative Modeling

Paper • 2412.12095 • Published Dec 16, 2024 • 23
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Paper • 2412.09619 • Published Dec 12, 2024 • 28
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 49
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published Dec 19, 2024 • 29

about 5 hours ago

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published Nov 7, 2024 • 30
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Paper • 2411.05005 • Published Nov 7, 2024 • 13
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models

Paper • 2411.04075 • Published Nov 6, 2024 • 17
Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 19

about 22 hours ago

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 17
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 61
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 75

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs