Generate audio for a video using captions and descriptions
LightGlue demo
Extreme Super-Resolution via Scale Autoregression
Scalable and Versatile 3D Generation from images
Generate images from text prompts
VQA
Generate realistic audio from text