--- license: apache-2.0 language: - en base_model: - Wan-AI/Wan2.1-T2V-1.3B pipeline_tag: text-to-video ---

EchoShot: Multi-Shot Portrait Video Generation

Jiahao Wang1 ยท Hualian Sheng2 ยท Sijia Cai2,† ยท Weizhan Zhang1,*
Caixia Yan1 ยท Yachuang Feng2 . Bing Deng2 . Jieping Ye2

1Xi'an Jiaotong University      2Alibaba Cloud

Paper PDF Project Page Github Page

## ๐Ÿ“ Intro This is the official model of EchoShot, which allows users to generate **multiple video shots showing the same person, controlled by customized prompts**. Currently it supports text-to-multishot portrait video generation. Hope you have fun with this demo!
## ๐Ÿ”” News - July 15, 2025: ๐Ÿ”ฅ EchoShot-1.3B-preview is now available at [HuggingFace](https://huggingface.co/JonneyWang/EchoShot)! - July 15, 2025: ๐ŸŽ‰ Release code of inference and training codes. - May 25, 2025: We propose [EchoShot](https://johnneywang.github.io/EchoShot-webpage/), a multi-shot portrait video generation model. ## ๐Ÿ“– Citation If you are inspired by our work, please cite our paper. ```bibtex @article{wang2025echoshot, title={EchoShot: Multi-Shot Portrait Video Generation}, author={Wang, Jiahao and Sheng, Hualian and Cai, Sijia and Zhang, Weizhan and Yan, Caixia and Feng, Yachuang and Deng, Bing and Ye, Jieping}, journal={arXiv preprint arXiv:2506.15838}, year={2025} } ```