Pipe1213/vits_fr · Hugging Face

This model was trained using part of SIWIS dataset and CSS10 dataset, only in french with two voices (F and M), It was trained for finetuning task in language similar than french. The generator and discriminator are included on the model so finetuning is available.

SIWIS dataset: https://datashare.ed.ac.uk/handle/10283/2353

CSS10 dataset : https://github.com/Kyubyong/css10

The first 3 parts of the SIWIS dataset, corresponding to the previously segmented female voice, have been taken from the first dataset. The second dataset has been fully used and corresponds to the male voice. In both cases the sampling frequency (22050) and the wav format encoding (pmc 16) have been adjusted. In addition, the transcriptions have been adapted by adjusting the format with the ljspeech dataset.

The alphabet used is:

!"&'()+,-./0123456789:;?ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz| «°»ÀÂÇÈÉÊÎÏÔÙÛàâæçèéêëîïôöùúûüœ–’“”…