-
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 30 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 37 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13 -
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 10
Ianna Chan
moegi161
AI & ML interests
Image synthesis, vision and language
Organizations
None yet
Vision and language
-
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 30 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 37 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13 -
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 10
3D
models
0
None public yet
datasets
0
None public yet