EchoShot: Multi-Shot Portrait Video Generation

Jiahao Wang1 Β· Hualian Sheng2 Β· Sijia Cai2,† Β· Weizhan Zhang1,*
Caixia Yan1 Β· Yachuang Feng2 . Bing Deng2 . Jieping Ye2

1Xi'an Jiaotong University      2Alibaba Cloud

Paper PDF Project Page Github Page

πŸ“ Intro

This is the official model of EchoShot, which allows users to generate multiple video shots showing the same person, controlled by customized prompts. Currently it supports text-to-multishot portrait video generation. Hope you have fun with this demo!

πŸ”” News

  • July 15, 2025: πŸ”₯ EchoShot-1.3B-preview is now available at HuggingFace!
  • July 15, 2025: πŸŽ‰ Release code of inference and training codes.
  • May 25, 2025: We propose EchoShot, a multi-shot portrait video generation model.

πŸ“– Citation

If you are inspired by our work, please cite our paper.

@article{wang2025echoshot,
  title={EchoShot: Multi-Shot Portrait Video Generation},
  author={Wang, Jiahao and Sheng, Hualian and Cai, Sijia and Zhang, Weizhan and Yan, Caixia and Feng, Yachuang and Deng, Bing and Ye, Jieping},
  journal={arXiv preprint arXiv:2506.15838},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for JonneyWang/EchoShot

Finetuned
(13)
this model