Zhenhong commited on
Commit
26ff55d
β€’
1 Parent(s): a46d973

Updated description

Browse files
Files changed (5) hide show
  1. .gitattributes +1 -1
  2. .gitignore +1 -1
  3. README.md +1 -1
  4. app.py +2 -7
  5. requirements.txt +3 -3
.gitattributes CHANGED
@@ -1,4 +1,3 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
  *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
4
  *.bz2 filter=lfs diff=lfs merge=lfs -text
@@ -32,3 +31,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
1
  *.arrow filter=lfs diff=lfs merge=lfs -text
2
  *.bin filter=lfs diff=lfs merge=lfs -text
3
  *.bz2 filter=lfs diff=lfs merge=lfs -text
 
31
  *.zip filter=lfs diff=lfs merge=lfs -text
32
  *.zst filter=lfs diff=lfs merge=lfs -text
33
  *tfevents* filter=lfs diff=lfs merge=lfs -text
34
+ *.7z filter=lfs diff=lfs merge=lfs -text
.gitignore CHANGED
@@ -1,4 +1,4 @@
 
1
  *.pyc
2
  __pycache__/
3
- .DS_Store
4
 
 
1
+ .DS_Store
2
  *.pyc
3
  __pycache__/
 
4
 
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: SpeechT5 Speech Synthesis Demo
3
  emoji: πŸ‘©β€πŸŽ€
4
  colorFrom: yellow
5
  colorTo: blue
 
1
  ---
2
+ title: Text-to-Speech Demo
3
  emoji: πŸ‘©β€πŸŽ€
4
  colorFrom: yellow
5
  colorTo: blue
app.py CHANGED
@@ -57,18 +57,13 @@ def predict(text, speaker):
57
  return (16000, speech)
58
 
59
 
60
- title = "SpeechT5: Speech Synthesis"
61
 
62
  description = """
63
  The <b>SpeechT5</b> model is pre-trained on text as well as speech inputs, with targets that are also a mix of text and speech.
64
  By pre-training on text and speech at the same time, it learns unified representations for both, resulting in improved modeling capabilities.
65
 
66
- SpeechT5 can be fine-tuned for different speech tasks. This space demonstrates the <b>text-to-speech</b> (TTS) checkpoint for the English language.
67
-
68
- See also the <a href="https://huggingface.co/spaces/Matthijs/speecht5-asr-demo">speech recognition (ASR) demo</a>
69
- and the <a href="https://huggingface.co/spaces/Matthijs/speecht5-vc-demo">voice conversion demo</a>.
70
-
71
- Refer to <a href="https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ">this Colab notebook</a> to learn how to fine-tune the SpeechT5 TTS model on your own dataset or language.
72
 
73
  <b>How to use:</b> Enter some English text and choose a speaker. The output is a mel spectrogram, which is converted to a mono 16 kHz waveform by the
74
  HiFi-GAN vocoder. Because the model always applies random dropout, each attempt will give slightly different results.
 
57
  return (16000, speech)
58
 
59
 
60
+ title = "Text-to-Speech based on SpeechT5"
61
 
62
  description = """
63
  The <b>SpeechT5</b> model is pre-trained on text as well as speech inputs, with targets that are also a mix of text and speech.
64
  By pre-training on text and speech at the same time, it learns unified representations for both, resulting in improved modeling capabilities.
65
 
66
+ This space demonstrates the <b>text-to-speech</b> (TTS) checkpoint for the English language.
 
 
 
 
 
67
 
68
  <b>How to use:</b> Enter some English text and choose a speaker. The output is a mel spectrogram, which is converted to a mono 16 kHz waveform by the
69
  HiFi-GAN vocoder. Because the model always applies random dropout, each attempt will give slightly different results.
requirements.txt CHANGED
@@ -1,8 +1,8 @@
1
  git+https://github.com/huggingface/transformers.git
2
  torch
3
  torchaudio
 
4
  soundfile
5
- librosa
6
  samplerate
7
- resampy
8
- sentencepiece
 
1
  git+https://github.com/huggingface/transformers.git
2
  torch
3
  torchaudio
4
+ sentencepiece
5
  soundfile
 
6
  samplerate
7
+ librosa
8
+ resampy