F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
A unified multimodal understanding and generation model.
Next-generation reasoning model that runs locally in-browser
Scalable and Versatile 3D Generation from images
Visual Quality Control for DocVQA
a tiny vision language model