Generate depth maps from images
Generate captions and analyze images with various tasks
Segment and caption objects in images and videos