Parler TTS

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

eustlb authored a paper about 2 months ago

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

Steveeeeeeen authored a paper about 2 months ago

Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement

Steveeeeeeen authored a paper about 2 months ago

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

View all activity

Organization Card

Community About org cards

Parler-TTS

Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). It is a reproduction of work from the paper Natural language guidance of high-fidelity text-to-speech with synthetic annotations by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.

Contrary to other TTS models, Parler-TTS is a fully open-source release. All of the datasets, pre-processing, training code, and weights are released publicly under a permissive license, enabling the community to build on our work and develop their own powerful TTS models. It consists in:

The Parler-TTS library for using and training high-quality TTS models.
The Data-Speech repository, for annotating speech characteristics in a large-scale setting.
This organization, that contains the released datasets and weights.

🚨 Two new checkpoints, Parler-TTS Mini v1.1 and Large v1, are out! 🚨 Trained on 45k hours of narrated audio, they're better and faster than previous versions, and introduce speaker consistency across generations. Try them out here 🤗!