Chat with model about images
Transcribe audio or YouTube videos into text
Generate a cartoon video from two images