EchoShot: Multi-Shot Portrait Video Generation
Jiahao Wang1
Β·
Hualian Sheng2
Β·
Sijia Cai2,β
Β·
Weizhan Zhang1,*
Caixia Yan1
Β·
Yachuang Feng2
.
Bing Deng2
.
Jieping Ye2
1Xi'an Jiaotong University
2Alibaba Cloud
π Intro
This is the official model of EchoShot, which allows users to generate multiple video shots showing the same person, controlled by customized prompts. Currently it supports text-to-multishot portrait video generation. Hope you have fun with this demo!

π News
- July 15, 2025: π₯ EchoShot-1.3B-preview is now available at HuggingFace!
- July 15, 2025: π Release code of inference and training codes.
- May 25, 2025: We propose EchoShot, a multi-shot portrait video generation model.
π Citation
If you are inspired by our work, please cite our paper.
@article{wang2025echoshot,
title={EchoShot: Multi-Shot Portrait Video Generation},
author={Wang, Jiahao and Sheng, Hualian and Cai, Sijia and Zhang, Weizhan and Yan, Caixia and Feng, Yachuang and Deng, Bing and Ye, Jieping},
journal={arXiv preprint arXiv:2506.15838},
year={2025}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for JonneyWang/EchoShot
Base model
Wan-AI/Wan2.1-T2V-1.3B