ZipVoice

Here, we share ZipVoice models trained on our department from Czech public speech datasets. We followed the recipes of the original ZipVoice model:

ZipVoice⚡: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching | paper | HF repo

For instructions on using the models, see the original GitHub repository ZipVoice or our Google Colab DEMO.

Models

1. zipvoice_cs_ParlaSpeech

model type: ZipVoice
training data: ParlaSpeech-CZ.v1.0 (1100 hours of parliamentary proceedings available in the Czech part of the ParlaMint corpus, automatically aligned with transcripts)
trained from scratch
the final model is a checkpoint averaged over the epoch range from 50 (excluded) to 60
▶️ Google Colab DEMO
📜 License CC-BY-NC-SA-4.0 (Non-commercial research use only).

Disclaimer

By using these models, you agree to inform the listeners that the speech samples are synthesized by the models, unless you have permission to use the voice you synthesize. That is, you agree to only use voices whose speakers grant permission to have their voice cloned, either directly or by license before making synthesized voices public, or you have to publicly announce that these voices are synthesized if you do not have the permission to use these voices.

Downloads last month: 4

Dataset used to train fav-kky/ZipVoice

Paper for fav-kky/ZipVoice

ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Paper • 2506.13053 • Published Jun 16, 2025