tanbw commited on
Commit
57ff8d8
·
1 Parent(s): a34db87

no message

Browse files
Files changed (4) hide show
  1. README.md +9 -195
  2. README_COSYVOICE.md +159 -0
  3. deploy.py +41 -0
  4. deploy.sh +25 -0
README.md CHANGED
@@ -1,195 +1,9 @@
1
- # CosyVoice
2
- ## 👉🏻 [CosyVoice Demos](https://fun-audio-llm.github.io/) 👈🏻
3
- [[CosyVoice Paper](https://fun-audio-llm.github.io/pdf/CosyVoice_v1.pdf)][[CosyVoice Studio](https://www.modelscope.cn/studios/iic/CosyVoice-300M)][[CosyVoice Code](https://github.com/FunAudioLLM/CosyVoice)]
4
-
5
- For `SenseVoice`, visit [SenseVoice repo](https://github.com/FunAudioLLM/SenseVoice) and [SenseVoice space](https://www.modelscope.cn/studios/iic/SenseVoice).
6
-
7
- ## Roadmap
8
-
9
- - [x] 2024/07
10
-
11
- - [x] Flow matching training support
12
- - [x] WeTextProcessing support when ttsfrd is not avaliable
13
- - [x] Fastapi server and client
14
-
15
- - [x] 2024/08
16
-
17
- - [x] Repetition Aware Sampling(RAS) inference for llm stability
18
- - [x] Streaming inference mode support, including kv cache and sdpa for rtf optimization
19
-
20
- - [x] 2024/09
21
-
22
- - [x] 25hz cosyvoice base model
23
- - [x] 25hz cosyvoice voice conversion model
24
-
25
- - [ ] TBD
26
-
27
- - [ ] 25hz llama based llm model which supports lora finetune
28
- - [ ] Support more instruction mode
29
- - [ ] Voice conversion
30
- - [ ] Music generation
31
- - [ ] Training script sample based on Mandarin
32
- - [ ] CosyVoice-500M trained with more multi-lingual data
33
- - [ ] More...
34
-
35
- ## Install
36
-
37
- **Clone and install**
38
-
39
- - Clone the repo
40
- ``` sh
41
- git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
42
- # If you failed to clone submodule due to network failures, please run following command until success
43
- cd CosyVoice
44
- git submodule update --init --recursive
45
- ```
46
-
47
- - Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
48
- - Create Conda env:
49
-
50
- ``` sh
51
- conda create -n cosyvoice python=3.8
52
- conda activate cosyvoice
53
- # pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform.
54
- conda install -y -c conda-forge pynini==2.1.5
55
- pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
56
-
57
- # If you encounter sox compatibility issues
58
- # ubuntu
59
- sudo apt-get install sox libsox-dev
60
- # centos
61
- sudo yum install sox sox-devel
62
- ```
63
-
64
- **Model download**
65
-
66
- We strongly recommend that you download our pretrained `CosyVoice-300M` `CosyVoice-300M-SFT` `CosyVoice-300M-Instruct` model and `CosyVoice-ttsfrd` resource.
67
-
68
- If you are expert in this field, and you are only interested in training your own CosyVoice model from scratch, you can skip this step.
69
-
70
- ``` python
71
- # SDK模型下载
72
- from modelscope import snapshot_download
73
- snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M')
74
- snapshot_download('iic/CosyVoice-300M-25Hz', local_dir='pretrained_models/CosyVoice-300M-25Hz')
75
- snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT')
76
- snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct')
77
- snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
78
- ```
79
-
80
- ``` sh
81
- # git模型下载,请确保已安装git lfs
82
- mkdir -p pretrained_models
83
- git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M
84
- git clone https://www.modelscope.cn/iic/CosyVoice-300M-25Hz.git pretrained_models/CosyVoice-300M-25Hz
85
- git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git pretrained_models/CosyVoice-300M-SFT
86
- git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git pretrained_models/CosyVoice-300M-Instruct
87
- git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd
88
- ```
89
-
90
- Optionaly, you can unzip `ttsfrd` resouce and install `ttsfrd` package for better text normalization performance.
91
-
92
- Notice that this step is not necessary. If you do not install `ttsfrd` package, we will use WeTextProcessing by default.
93
-
94
- ``` sh
95
- cd pretrained_models/CosyVoice-ttsfrd/
96
- unzip resource.zip -d .
97
- pip install ttsfrd-0.3.6-cp38-cp38-linux_x86_64.whl
98
- ```
99
-
100
- **Basic Usage**
101
-
102
- For zero_shot/cross_lingual inference, please use `CosyVoice-300M` model.
103
- For sft inference, please use `CosyVoice-300M-SFT` model.
104
- For instruct inference, please use `CosyVoice-300M-Instruct` model.
105
- First, add `third_party/Matcha-TTS` to your `PYTHONPATH`.
106
-
107
- ``` sh
108
- export PYTHONPATH=third_party/Matcha-TTS
109
- ```
110
-
111
- ``` python
112
- from cosyvoice.cli.cosyvoice import CosyVoice
113
- from cosyvoice.utils.file_utils import load_wav
114
- import torchaudio
115
-
116
- cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-SFT')
117
- # sft usage
118
- print(cosyvoice.list_avaliable_spks())
119
- # change stream=True for chunk stream inference
120
- for i, j in enumerate(cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女', stream=False)):
121
- torchaudio.save('sft_{}.wav'.format(i), j['tts_speech'], 22050)
122
-
123
- cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-25Hz') # or change to pretrained_models/CosyVoice-300M for 50Hz inference
124
- # zero_shot usage, <|zh|><|en|><|jp|><|yue|><|ko|> for Chinese/English/Japanese/Cantonese/Korean
125
- prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
126
- for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
127
- torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], 22050)
128
- # cross_lingual usage
129
- prompt_speech_16k = load_wav('cross_lingual_prompt.wav', 16000)
130
- for i, j in enumerate(cosyvoice.inference_cross_lingual('<|en|>And then later on, fully acquiring that company. So keeping management in line, interest in line with the asset that\'s coming into the family is a reason why sometimes we don\'t buy the whole thing.', prompt_speech_16k, stream=False)):
131
- torchaudio.save('cross_lingual_{}.wav'.format(i), j['tts_speech'], 22050)
132
- # vc usage
133
- prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
134
- source_speech_16k = load_wav('cross_lingual_prompt.wav', 16000)
135
- for i, j in enumerate(cosyvoice.inference_vc(source_speech_16k, prompt_speech_16k, stream=False)):
136
- torchaudio.save('vc_{}.wav'.format(i), j['tts_speech'], 22050)
137
-
138
- cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-Instruct')
139
- # instruct usage, support <laughter></laughter><strong></strong>[laughter][breath]
140
- for i, j in enumerate(cosyvoice.inference_instruct('在面对挑战时,他展现了非凡的<strong>勇气</strong>与<strong>智慧</strong>。', '中文男', 'Theo \'Crimson\', is a fiery, passionate rebel leader. Fights with fervor for justice, but struggles with impulsiveness.', stream=False)):
141
- torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], 22050)
142
- ```
143
-
144
- **Start web demo**
145
-
146
- You can use our web demo page to get familiar with CosyVoice quickly.
147
- We support sft/zero_shot/cross_lingual/instruct inference in web demo.
148
-
149
- Please see the demo website for details.
150
-
151
- ``` python
152
- # change iic/CosyVoice-300M-SFT for sft inference, or iic/CosyVoice-300M-Instruct for instruct inference
153
- python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M
154
- ```
155
-
156
- **Advanced Usage**
157
-
158
- For advanced user, we have provided train and inference scripts in `examples/libritts/cosyvoice/run.sh`.
159
- You can get familiar with CosyVoice following this recipie.
160
-
161
- **Build for deployment**
162
-
163
- Optionally, if you want to use grpc for service deployment,
164
- you can run following steps. Otherwise, you can just ignore this step.
165
-
166
- ``` sh
167
- cd runtime/python
168
- docker build -t cosyvoice:v1.0 .
169
- # change iic/CosyVoice-300M to iic/CosyVoice-300M-Instruct if you want to use instruct inference
170
- # for grpc usage
171
- docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/grpc && python3 server.py --port 50000 --max_conc 4 --model_dir iic/CosyVoice-300M && sleep infinity"
172
- cd grpc && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>
173
- # for fastapi usage
174
- docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/fastapi && python3 server.py --port 50000 --model_dir iic/CosyVoice-300M && sleep infinity"
175
- cd fastapi && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>
176
- ```
177
-
178
- ## Discussion & Communication
179
-
180
- You can directly discuss on [Github Issues](https://github.com/FunAudioLLM/CosyVoice/issues).
181
-
182
- You can also scan the QR code to join our official Dingding chat group.
183
-
184
- <img src="./asset/dingding.png" width="250px">
185
-
186
- ## Acknowledge
187
-
188
- 1. We borrowed a lot of code from [FunASR](https://github.com/modelscope/FunASR).
189
- 2. We borrowed a lot of code from [FunCodec](https://github.com/modelscope/FunCodec).
190
- 3. We borrowed a lot of code from [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS).
191
- 4. We borrowed a lot of code from [AcademiCodec](https://github.com/yangdongchao/AcademiCodec).
192
- 5. We borrowed a lot of code from [WeNet](https://github.com/wenet-e2e/wenet).
193
-
194
- ## Disclaimer
195
- The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.
 
1
+ title: CosyVoice
2
+ emoji: 🏃
3
+ colorFrom: red
4
+ colorTo: blue
5
+ sdk: gradio
6
+ sdk_version: 4.44.0
7
+ app_file: deploy.py
8
+ pinned: false
9
+ short_description: CosyVoice
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README_COSYVOICE.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CosyVoice
2
+ ## 👉🏻 [CosyVoice Demos](https://fun-audio-llm.github.io/) 👈🏻
3
+ [[CosyVoice Paper](https://fun-audio-llm.github.io/pdf/CosyVoice_v1.pdf)][[CosyVoice Studio](https://www.modelscope.cn/studios/iic/CosyVoice-300M)][[CosyVoice Code](https://github.com/FunAudioLLM/CosyVoice)]
4
+
5
+ For `SenseVoice`, visit [SenseVoice repo](https://github.com/FunAudioLLM/SenseVoice) and [SenseVoice space](https://www.modelscope.cn/studios/iic/SenseVoice).
6
+
7
+ ## Install
8
+
9
+ **Clone and install**
10
+
11
+ - Clone the repo
12
+ ``` sh
13
+ git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
14
+ # If you failed to clone submodule due to network failures, please run following command until success
15
+ cd CosyVoice
16
+ git submodule update --init --recursive
17
+ ```
18
+
19
+ - Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
20
+ - Create Conda env:
21
+
22
+ ``` sh
23
+ conda create -n cosyvoice python=3.8
24
+ conda activate cosyvoice
25
+ # pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform.
26
+ conda install -y -c conda-forge pynini==2.1.5
27
+ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
28
+
29
+ # If you encounter sox compatibility issues
30
+ # ubuntu
31
+ sudo apt-get install sox libsox-dev
32
+ # centos
33
+ sudo yum install sox sox-devel
34
+ ```
35
+
36
+ **Model download**
37
+
38
+ We strongly recommend that you download our pretrained `CosyVoice-300M` `CosyVoice-300M-SFT` `CosyVoice-300M-Instruct` model and `CosyVoice-ttsfrd` resource.
39
+
40
+ If you are expert in this field, and you are only interested in training your own CosyVoice model from scratch, you can skip this step.
41
+
42
+ Download models with python script.
43
+ ``` shell
44
+ python download.py
45
+ ```
46
+
47
+ Download models with git, you should install `git lfs` first.
48
+ ``` sh
49
+ mkdir -p pretrained_models
50
+ git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M
51
+ git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git pretrained_models/CosyVoice-300M-SFT
52
+ git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git pretrained_models/CosyVoice-300M-Instruct
53
+ git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd
54
+ ```
55
+
56
+ Optionaly, you can unzip `ttsfrd` resouce and install `ttsfrd` package for better text normalization performance.
57
+
58
+ Notice that this step is not necessary. If you do not install `ttsfrd` package, we will use WeTextProcessing by default.
59
+
60
+ ``` sh
61
+ cd pretrained_models/CosyVoice-ttsfrd/
62
+ unzip resource.zip -d .
63
+ pip install ttsfrd-0.3.6-cp38-cp38-linux_x86_64.whl
64
+ ```
65
+
66
+ **Basic Usage**
67
+
68
+ For zero_shot/cross_lingual inference, please use `CosyVoice-300M` model.
69
+ For sft inference, please use `CosyVoice-300M-SFT` model.
70
+ For instruct inference, please use `CosyVoice-300M-Instruct` model.
71
+ First, add `third_party/Matcha-TTS` to your `PYTHONPATH`.
72
+
73
+ ``` sh
74
+ export PYTHONPATH=third_party/Matcha-TTS
75
+ ```
76
+
77
+ ``` python
78
+ from cosyvoice.cli.cosyvoice import CosyVoice
79
+ from cosyvoice.utils.file_utils import load_wav
80
+ import torchaudio
81
+
82
+ cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-SFT')
83
+ # sft usage
84
+ print(cosyvoice.list_avaliable_spks())
85
+ output = cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女')
86
+ torchaudio.save('sft.wav', output['tts_speech'], 22050)
87
+
88
+ cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M')
89
+ # zero_shot usage, <|zh|><|en|><|jp|><|yue|><|ko|> for Chinese/English/Japanese/Cantonese/Korean
90
+ prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
91
+ output = cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k)
92
+ torchaudio.save('zero_shot.wav', output['tts_speech'], 22050)
93
+ # cross_lingual usage
94
+ prompt_speech_16k = load_wav('cross_lingual_prompt.wav', 16000)
95
+ output = cosyvoice.inference_cross_lingual('<|en|>And then later on, fully acquiring that company. So keeping management in line, interest in line with the asset that\'s coming into the family is a reason why sometimes we don\'t buy the whole thing.', prompt_speech_16k)
96
+ torchaudio.save('cross_lingual.wav', output['tts_speech'], 22050)
97
+
98
+ cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-Instruct')
99
+ # instruct usage, support <laughter></laughter><strong></strong>[laughter][breath]
100
+ output = cosyvoice.inference_instruct('在面对挑战时,他展现了非凡的<strong>勇气</strong>与<strong>智慧</strong>。', '中文男', 'Theo \'Crimson\', is a fiery, passionate rebel leader. Fights with fervor for justice, but struggles with impulsiveness.')
101
+ torchaudio.save('instruct.wav', output['tts_speech'], 22050)
102
+ ```
103
+
104
+ **Start web demo**
105
+
106
+ You can use our web demo page to get familiar with CosyVoice quickly.
107
+ We support sft/zero_shot/cross_lingual/instruct inference in web demo.
108
+
109
+ Please see the demo website for details.
110
+
111
+ ``` python
112
+ # change iic/CosyVoice-300M-SFT for sft inference, or iic/CosyVoice-300M-Instruct for instruct inference
113
+ python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M
114
+ ```
115
+
116
+ **Advanced Usage**
117
+
118
+ For advanced user, we have provided train and inference scripts in `examples/libritts/cosyvoice/run.sh`.
119
+ You can get familiar with CosyVoice following this recipie.
120
+
121
+ **Serve with FastAPI**
122
+ ```sh
123
+ # For development
124
+ fastapi dev --port 3003
125
+ # For production
126
+ fastapi run --port 3003
127
+ ```
128
+
129
+ **Build for deployment**
130
+
131
+ Optionally, if you want to use grpc for service deployment,
132
+ you can run following steps. Otherwise, you can just ignore this step.
133
+
134
+ ``` sh
135
+ cd runtime/python
136
+ docker build -t cosyvoice:v1.0 .
137
+ # change iic/CosyVoice-300M to iic/CosyVoice-300M-Instruct if you want to use instruct inference
138
+ docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python && python3 server.py --port 50000 --max_conc 4 --model_dir iic/CosyVoice-300M && sleep infinity"
139
+ python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>
140
+ ```
141
+
142
+ ## Discussion & Communication
143
+
144
+ You can directly discuss on [Github Issues](https://github.com/FunAudioLLM/CosyVoice/issues).
145
+
146
+ You can also scan the QR code to join our official Dingding chat group.
147
+
148
+ <img src="./asset/dingding.png" width="250px">
149
+
150
+ ## Acknowledge
151
+
152
+ 1. We borrowed a lot of code from [FunASR](https://github.com/modelscope/FunASR).
153
+ 2. We borrowed a lot of code from [FunCodec](https://github.com/modelscope/FunCodec).
154
+ 3. We borrowed a lot of code from [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS).
155
+ 4. We borrowed a lot of code from [AcademiCodec](https://github.com/yangdongchao/AcademiCodec).
156
+ 5. We borrowed a lot of code from [WeNet](https://github.com/wenet-e2e/wenet).
157
+
158
+ ## Disclaimer
159
+ The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.
deploy.py ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import subprocess
3
+ logging.basicConfig(level=logging.INFO)
4
+
5
+ def run_shell_script(script_path):
6
+ """
7
+ 运行指定路径的shell脚本,并打印输出到控制台。
8
+
9
+ :param script_path: Shell脚本的文件路径
10
+ """
11
+ try:
12
+ # 使用subprocess.Popen来运行shell脚本
13
+ with subprocess.Popen(['bash', script_path], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True) as proc:
14
+ # 读取输出
15
+ for line in proc.stdout:
16
+ print(line, end='') # 实时打印输出
17
+ proc.stdout.close()
18
+ return_code = proc.wait()
19
+ if return_code:
20
+ print(f"Shell脚本运行出错,返回码:{return_code}")
21
+ except Exception as e:
22
+ print(f"运行shell脚本时发生错误:{e}")
23
+
24
+ # 使用方法示例
25
+ # 假设有一个名为example.sh的脚本文件在当前目录下
26
+ run_shell_script('deploy.sh')
27
+
28
+ class args:
29
+ def __init__(self):
30
+ self.port = 5000
31
+
32
+ from webui import main
33
+ from cosyvoice.cli.cosyvoice import CosyVoice
34
+ import numpy as np
35
+
36
+ cosyvoice = CosyVoice("pretrained_models/CosyVoice-300M")
37
+ sft_spk = cosyvoice.list_avaliable_spks()
38
+ prompt_sr, target_sr = 16000, 22050
39
+ default_data = np.zeros(target_sr)
40
+ args.port = 5000
41
+ main()
deploy.sh ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
2
+ # If you failed to clone submodule due to network failures, please run following command until success
3
+ cd CosyVoice
4
+ git submodule update --init --recursive
5
+
6
+ # pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform.
7
+ pip install pynini==2.1.5
8
+ pip install -r requirements.txt
9
+
10
+ # If you encounter sox compatibility issues
11
+ # ubuntu
12
+ sudo apt-get install sox libsox-dev
13
+
14
+ mkdir -p pretrained_models
15
+ huggingface-cli download model-scope/CosyVoice-300M --local-dir pretrained_models/CosyVoice-300M --token=$(cat /run/secrets/hf_token)
16
+ huggingface-cli download model-scope/CosyVoice-300M-SFT --local-dir pretrained_models/CosyVoice-300M-SFT --token=$(cat /run/secrets/hf_token)
17
+ huggingface-cli download FunAudioLLM/CosyVoice-ttsfrd --local-dir pretrained_models/CosyVoice-ttsfrd --token=$(cat /run/secrets/hf_token)
18
+
19
+ ls pretrained_models
20
+
21
+ cd pretrained_models/CosyVoice-ttsfrd/
22
+ unzip resource.zip -d .
23
+ pip install ttsfrd-0.3.6-cp38-cp38-linux_x86_64.whl
24
+
25
+ export PYTHONPATH=third_party/Matcha-TTS