-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 27 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 43 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2503.23461
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 83 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 147 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Paper • 2503.18446 • Published • 9 -
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Paper • 2503.20240 • Published • 21 -
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Paper • 2503.20672 • Published • 13 -
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Paper • 2503.20198 • Published • 4
-
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
Paper • 2502.18461 • Published • 15 -
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Paper • 2410.10792 • Published • 30 -
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
Paper • 2503.13070 • Published • 9 -
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Paper • 2503.12885 • Published • 42
-
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
Paper • 2412.05355 • Published • 9 -
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
Paper • 2412.04301 • Published • 38 -
PanoDreamer: 3D Panorama Synthesis from a Single Image
Paper • 2412.04827 • Published • 11 -
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Paper • 2412.06781 • Published • 21
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 17 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 60 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 74