---
license: apache-2.0
language:
- en
base_model:
- Wan-AI/Wan2.1-T2V-1.3B
pipeline_tag: text-to-video
---
EchoShot: Multi-Shot Portrait Video Generation
Jiahao Wang1
ยท
Hualian Sheng2
ยท
Sijia Cai2,†
ยท
Weizhan Zhang1,*
Caixia Yan1
ยท
Yachuang Feng2
.
Bing Deng2
.
Jieping Ye2
1Xi'an Jiaotong University
2Alibaba Cloud
## ๐ Intro
This is the official model of EchoShot, which allows users to generate **multiple video shots showing the same person, controlled by customized prompts**. Currently it supports text-to-multishot portrait video generation. Hope you have fun with this demo!
## ๐ News
- July 15, 2025: ๐ฅ EchoShot-1.3B-preview is now available at [HuggingFace](https://huggingface.co/JonneyWang/EchoShot)!
- July 15, 2025: ๐ Release code of inference and training codes.
- May 25, 2025: We propose [EchoShot](https://johnneywang.github.io/EchoShot-webpage/), a multi-shot portrait video generation model.
## ๐ Citation
If you are inspired by our work, please cite our paper.
```bibtex
@article{wang2025echoshot,
title={EchoShot: Multi-Shot Portrait Video Generation},
author={Wang, Jiahao and Sheng, Hualian and Cai, Sijia and Zhang, Weizhan and Yan, Caixia and Feng, Yachuang and Deng, Bing and Ye, Jieping},
journal={arXiv preprint arXiv:2506.15838},
year={2025}
}
```