EchoShot: Multi-Shot Portrait Video Generation

Jiahao Wang¹ · Hualian Sheng² · Sijia Cai^2,† · Weizhan Zhang^1,*
Caixia Yan¹ · Yachuang Feng² . Bing Deng² . Jieping Ye²

¹Xi'an Jiaotong University ²Alibaba Cloud

📝 Intro

This is the official model of EchoShot, which allows users to generate multiple video shots showing the same person, controlled by customized prompts. Currently it supports text-to-multishot portrait video generation. Hope you have fun with this demo!

🔔 News

July 15, 2025: 🔥 EchoShot-1.3B-preview is now available at HuggingFace!
July 15, 2025: 🎉 Release code of inference and training codes.
May 25, 2025: We propose EchoShot, a multi-shot portrait video generation model.

📖 Citation

If you are inspired by our work, please cite our paper.

@article{wang2025echoshot,
  title={EchoShot: Multi-Shot Portrait Video Generation},
  author={Wang, Jiahao and Sheng, Hualian and Cai, Sijia and Zhang, Weizhan and Yan, Caixia and Feng, Yachuang and Deng, Bing and Ye, Jieping},
  journal={arXiv preprint arXiv:2506.15838},
  year={2025}
}

JonneyWang
/

EchoShot

EchoShot: Multi-Shot Portrait Video Generation

📝 Intro

🔔 News

📖 Citation

Model tree for JonneyWang/EchoShot