Wan: Open and Advanced Large-Scale Video Generative Models
Select coordinates on an image based on instructions
Generate click coordinates from image and instruction
Convert images of screens to structured elements
F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
Engage in multi-modal conversations with images and videos
Generate clickable coordinates on a screenshot