Spaces:
Running
on
Zero
Running
on
Zero
Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,204 @@ emoji: ๐ฅ
|
|
4 |
colorFrom: yellow
|
5 |
colorTo: green
|
6 |
sdk: gradio
|
7 |
-
sdk_version: 5.
|
8 |
app_file: app.py
|
9 |
short_description: Voice Clone Multilingual TTS
|
10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
colorFrom: yellow
|
5 |
colorTo: green
|
6 |
sdk: gradio
|
7 |
+
sdk_version: 5.35.0
|
8 |
app_file: app.py
|
9 |
short_description: Voice Clone Multilingual TTS
|
10 |
---
|
11 |
+
## ๐๏ธ Voice Clone Multilingual TTS: Advanced AI Voice Synthesis and Cloning
|
12 |
+
|
13 |
+
### Transform Text to Natural Speech with Custom Voice Cloning
|
14 |
+
|
15 |
+
Welcome to **Voice Clone Multilingual TTS**, a cutting-edge text-to-speech system powered by OuteTTS-0.3-1B that offers both high-quality voice synthesis and advanced voice cloning capabilities. Create natural-sounding speech in multiple languages using preset voices or clone any voice from a short audio sample.
|
16 |
+
|
17 |
+
### What is Voice Clone Multilingual TTS?
|
18 |
+
|
19 |
+
Voice Clone Multilingual TTS is an **advanced AI-powered speech synthesis tool** that converts text into natural-sounding speech with remarkable accuracy. Using the OuteTTS-0.3-1B model with bfloat16 precision, it offers both preset speaker voices and the ability to clone custom voices from reference audio, making it perfect for content creation, accessibility, and creative projects.
|
20 |
+
|
21 |
+
### Key Features for Professional Voice Synthesis
|
22 |
+
|
23 |
+
- **๐ญ Voice Cloning**: Clone any voice from 7-10 seconds of reference audio
|
24 |
+
- **๐ Multilingual Support**: Generate speech in multiple languages
|
25 |
+
- **๐ฅ Preset Speakers**: Choose from various pre-configured voice profiles
|
26 |
+
- **๐๏ธ Fine Control**: Adjust temperature and repetition penalty
|
27 |
+
- **โก GPU Acceleration**: Fast generation with CUDA optimization
|
28 |
+
- **๐ต Natural Prosody**: Realistic intonation and rhythm
|
29 |
+
- **๐ Whisper Integration**: Automatic transcription for voice cloning
|
30 |
+
- **๐พ WAV Export**: High-quality audio output format
|
31 |
+
|
32 |
+
### How It Works
|
33 |
+
|
34 |
+
#### **Simple Generation Process**
|
35 |
+
1. **Enter Text**: Type or paste your text content
|
36 |
+
2. **Choose Voice**: Select preset speaker or upload reference audio
|
37 |
+
3. **Adjust Settings**: Fine-tune temperature and penalties
|
38 |
+
4. **Generate**: Create natural-sounding speech instantly
|
39 |
+
|
40 |
+
#### **Voice Cloning Technology**
|
41 |
+
- Upload 7-10 seconds of clear reference audio
|
42 |
+
- AI analyzes voice characteristics and patterns
|
43 |
+
- Applies learned voice profile to new text
|
44 |
+
- Maintains speaker identity across languages
|
45 |
+
|
46 |
+
### Perfect Use Cases
|
47 |
+
|
48 |
+
- **Content Creation**: Narration for videos and podcasts
|
49 |
+
- **Audiobook Production**: Convert books to audio format
|
50 |
+
- **Language Learning**: Practice pronunciation with native accents
|
51 |
+
- **Accessibility**: Make written content accessible to all
|
52 |
+
- **Voice Preservation**: Clone and preserve unique voices
|
53 |
+
- **Creative Projects**: Character voices for games or animations
|
54 |
+
- **Business Applications**: Automated customer service voices
|
55 |
+
- **Personal Use**: Create custom voice assistants
|
56 |
+
|
57 |
+
### Advanced Controls
|
58 |
+
|
59 |
+
- **Temperature (0.1-1.0)**:
|
60 |
+
- Lower values: More stable, consistent tone
|
61 |
+
- Higher values: More expressive, varied intonation
|
62 |
+
- **Repetition Penalty (0.5-2.0)**: Prevents repetitive patterns
|
63 |
+
- **Speaker Selection**: Multiple preset voice profiles
|
64 |
+
- **Reference Audio**: Custom voice cloning input
|
65 |
+
- **Max Length**: Up to 4096 tokens per generation
|
66 |
+
|
67 |
+
### Technical Specifications
|
68 |
+
|
69 |
+
- **Model**: OuteAI/OuteTTS-0.3-1B
|
70 |
+
- **Precision**: bfloat16 for optimal performance
|
71 |
+
- **Framework**: PyTorch with CUDA support
|
72 |
+
- **Transcription**: Whisper Turbo for voice analysis
|
73 |
+
- **Output Format**: WAV audio files
|
74 |
+
- **GPU Optimization**: Automatic CUDA memory management
|
75 |
+
- **Interface**: Gradio with responsive design
|
76 |
+
|
77 |
+
### Voice Cloning Best Practices
|
78 |
+
|
79 |
+
1. **Audio Quality**: Use clear, noise-free recordings
|
80 |
+
2. **Duration**: Optimal results with 7-10 second samples
|
81 |
+
3. **Consistency**: Single speaker without background noise
|
82 |
+
4. **Format**: Support for common audio formats
|
83 |
+
5. **Content**: Natural speech patterns work best
|
84 |
+
6. **Language**: Can clone across different languages
|
85 |
+
|
86 |
+
### Why Choose Voice Clone Multilingual TTS?
|
87 |
+
|
88 |
+
1. **Professional Quality**: Studio-grade voice synthesis
|
89 |
+
2. **Versatile Options**: Preset voices or custom cloning
|
90 |
+
3. **Fast Processing**: GPU-accelerated generation
|
91 |
+
4. **User-Friendly**: Simple interface for all users
|
92 |
+
5. **Flexible Output**: Adjustable voice characteristics
|
93 |
+
6. **Free Access**: No subscription or usage limits
|
94 |
+
|
95 |
+
### Technical Innovation
|
96 |
+
|
97 |
+
- **Advanced Architecture**: State-of-the-art TTS model
|
98 |
+
- **Memory Efficient**: Automatic CUDA cache management
|
99 |
+
- **Error Handling**: Robust generation with fallbacks
|
100 |
+
- **Dynamic Loading**: On-demand model initialization
|
101 |
+
- **Quality Assurance**: Built-in audio validation
|
102 |
+
|
103 |
+
### Start Creating Natural Speech
|
104 |
+
|
105 |
+
Transform your text into lifelike speech with professional quality. Whether using preset voices or cloning custom voices, Voice Clone Multilingual TTS provides the tools for exceptional audio content creation.
|
106 |
+
|
107 |
+
**Community**: [Discord - Openfree AI](https://discord.gg/openfreeai) | **More AI Tools**: [OpenFree Best AI Services](https://huggingface.co/spaces/openfree/Best-AI)
|
108 |
+
|
109 |
+
---
|
110 |
+
|
111 |
+
## ๐๏ธ ์์ฑ ๋ณต์ ๋ค๊ตญ์ด TTS: ๊ณ ๊ธ AI ์์ฑ ํฉ์ฑ ๋ฐ ๋ณต์
|
112 |
+
|
113 |
+
### ๋ง์ถคํ ์์ฑ ๋ณต์ ๋ก ํ
์คํธ๋ฅผ ์์ฐ์ค๋ฌ์ด ์์ฑ์ผ๋ก ๋ณํ
|
114 |
+
|
115 |
+
**์์ฑ ๋ณต์ ๋ค๊ตญ์ด TTS**์ ์ค์ ๊ฒ์ ํ์ํฉ๋๋ค. ๊ณ ํ์ง ์์ฑ ํฉ์ฑ๊ณผ ๊ณ ๊ธ ์์ฑ ๋ณต์ ๊ธฐ๋ฅ์ ๋ชจ๋ ์ ๊ณตํ๋ OuteTTS-0.3-1B ๊ธฐ๋ฐ์ ์ต์ฒจ๋จ ํ
์คํธ ์์ฑ ๋ณํ ์์คํ
์
๋๋ค. ์ฌ์ ์ค์ ๋ ์์ฑ์ ์ฌ์ฉํ๊ฑฐ๋ ์งง์ ์ค๋์ค ์ํ์์ ์์ฑ์ ๋ณต์ ํ์ฌ ์ฌ๋ฌ ์ธ์ด๋ก ์์ฐ์ค๋ฌ์ด ์์ฑ์ ์์ฑํ์ธ์.
|
116 |
+
|
117 |
+
### ์์ฑ ๋ณต์ ๋ค๊ตญ์ด TTS๋?
|
118 |
+
|
119 |
+
์์ฑ ๋ณต์ ๋ค๊ตญ์ด TTS๋ ํ
์คํธ๋ฅผ ๋๋ผ์ด ์ ํ๋๋ก ์์ฐ์ค๋ฌ์ด ์์ฑ์ผ๋ก ๋ณํํ๋ **๊ณ ๊ธ AI ๊ธฐ๋ฐ ์์ฑ ํฉ์ฑ ๋๊ตฌ**์
๋๋ค. bfloat16 ์ ๋ฐ๋์ OuteTTS-0.3-1B ๋ชจ๋ธ์ ์ฌ์ฉํ์ฌ ์ฌ์ ์ค์ ๋ ํ์ ์์ฑ๊ณผ ์ฐธ์กฐ ์ค๋์ค์์ ์ฌ์ฉ์ ์ ์ ์์ฑ์ ๋ณต์ ํ๋ ๊ธฐ๋ฅ์ ๋ชจ๋ ์ ๊ณตํ๋ฏ๋ก ์ฝํ
์ธ ์ ์, ์ ๊ทผ์ฑ ๋ฐ ์ฐฝ์์ ์ธ ํ๋ก์ ํธ์ ์๋ฒฝํฉ๋๋ค.
|
120 |
+
|
121 |
+
### ์ ๋ฌธ ์์ฑ ํฉ์ฑ์ ์ํ ์ฃผ์ ๊ธฐ๋ฅ
|
122 |
+
|
123 |
+
- **๐ญ ์์ฑ ๋ณต์ **: 7-10์ด์ ์ฐธ์กฐ ์ค๋์ค์์ ๋ชจ๋ ์์ฑ ๋ณต์
|
124 |
+
- **๐ ๋ค๊ตญ์ด ์ง์**: ์ฌ๋ฌ ์ธ์ด๋ก ์์ฑ ์์ฑ
|
125 |
+
- **๐ฅ ์ฌ์ ์ค์ ํ์**: ๋ค์ํ ์ฌ์ ๊ตฌ์ฑ ์์ฑ ํ๋กํ ์ค ์ ํ
|
126 |
+
- **๐๏ธ ์ธ๋ฐํ ์ ์ด**: ์จ๋ ๋ฐ ๋ฐ๋ณต ํ๋ํฐ ์กฐ์
|
127 |
+
- **โก GPU ๊ฐ์**: CUDA ์ต์ ํ๋ก ๋น ๋ฅธ ์์ฑ
|
128 |
+
- **๐ต ์์ฐ์ค๋ฌ์ด ์ด์จ**: ์ฌ์ค์ ์ธ ์ต์๊ณผ ๋ฆฌ๋ฌ
|
129 |
+
- **๐ Whisper ํตํฉ**: ์์ฑ ๋ณต์ ๋ฅผ ์ํ ์๋ ์ ์ฌ
|
130 |
+
- **๐พ WAV ๋ด๋ณด๋ด๊ธฐ**: ๊ณ ํ์ง ์ค๋์ค ์ถ๋ ฅ ํ์
|
131 |
+
|
132 |
+
### ์๋ ๋ฐฉ์
|
133 |
+
|
134 |
+
#### **๊ฐ๋จํ ์์ฑ ํ๋ก์ธ์ค**
|
135 |
+
1. **ํ
์คํธ ์
๋ ฅ**: ํ
์คํธ ๋ด์ฉ ์
๋ ฅ ๋๋ ๋ถ์ฌ๋ฃ๊ธฐ
|
136 |
+
2. **์์ฑ ์ ํ**: ์ฌ์ ์ค์ ํ์ ์ ํ ๋๋ ์ฐธ์กฐ ์ค๋์ค ์
๋ก๋
|
137 |
+
3. **์ค์ ์กฐ์ **: ์จ๋ ๋ฐ ํ๋ํฐ ๋ฏธ์ธ ์กฐ์
|
138 |
+
4. **์์ฑ**: ์ฆ์ ์์ฐ์ค๋ฌ์ด ์์ฑ ์์ฑ
|
139 |
+
|
140 |
+
#### **์์ฑ ๋ณต์ ๊ธฐ์ **
|
141 |
+
- 7-10์ด์ ๋ช
ํํ ์ฐธ์กฐ ์ค๋์ค ์
๋ก๋
|
142 |
+
- AI๊ฐ ์์ฑ ํน์ฑ๊ณผ ํจํด ๋ถ์
|
143 |
+
- ํ์ต๋ ์์ฑ ํ๋กํ์ ์ ํ
์คํธ์ ์ ์ฉ
|
144 |
+
- ์ธ์ด ๊ฐ ํ์ ์ ์ฒด์ฑ ์ ์ง
|
145 |
+
|
146 |
+
### ์๋ฒฝํ ์ฌ์ฉ ์ฌ๋ก
|
147 |
+
|
148 |
+
- **์ฝํ
์ธ ์ ์**: ๋น๋์ค ๋ฐ ํ์บ์คํธ์ฉ ๋ด๋ ์ด์
|
149 |
+
- **์ค๋์ค๋ถ ์ ์**: ์ฑ
์ ์ค๋์ค ํ์์ผ๋ก ๋ณํ
|
150 |
+
- **์ธ์ด ํ์ต**: ์์ด๋ฏผ ์ต์์ผ๋ก ๋ฐ์ ์ฐ์ต
|
151 |
+
- **์ ๊ทผ์ฑ**: ์๋ฉด ์ฝํ
์ธ ๋ฅผ ๋ชจ๋๊ฐ ์ ๊ทผ ๊ฐ๋ฅํ๊ฒ
|
152 |
+
- **์์ฑ ๋ณด์กด**: ๊ณ ์ ํ ์์ฑ ๋ณต์ ๋ฐ ๋ณด์กด
|
153 |
+
- **์ฐฝ์์ ํ๋ก์ ํธ**: ๊ฒ์์ด๋ ์ ๋๋ฉ์ด์
์ฉ ์บ๋ฆญํฐ ์์ฑ
|
154 |
+
- **๋น์ฆ๋์ค ์์ฉ**: ์๋ํ๋ ๊ณ ๊ฐ ์๋น์ค ์์ฑ
|
155 |
+
- **๊ฐ์ธ ์ฌ์ฉ**: ๋ง์ถคํ ์์ฑ ๋น์ ๋ง๋ค๊ธฐ
|
156 |
+
|
157 |
+
### ๊ณ ๊ธ ์ ์ด
|
158 |
+
|
159 |
+
- **์จ๋ (0.1-1.0)**:
|
160 |
+
- ๋ฎ์ ๊ฐ: ๋ ์์ ์ ์ด๊ณ ์ผ๊ด๋ ํค
|
161 |
+
- ๋์ ๊ฐ: ๋ ํํ๋ ฅ ์๊ณ ๋ค์ํ ์ต์
|
162 |
+
- **๋ฐ๋ณต ํ๋ํฐ (0.5-2.0)**: ๋ฐ๋ณต ํจํด ๋ฐฉ์ง
|
163 |
+
- **ํ์ ์ ํ**: ์ฌ๋ฌ ์ฌ์ ์ค์ ์์ฑ ํ๋กํ
|
164 |
+
- **์ฐธ์กฐ ์ค๋์ค**: ๋ง์ถคํ ์์ฑ ๋ณต์ ์
๋ ฅ
|
165 |
+
- **์ต๋ ๊ธธ์ด**: ์์ฑ๋น ์ต๋ 4096 ํ ํฐ
|
166 |
+
|
167 |
+
### ๊ธฐ์ ์ฌ์
|
168 |
+
|
169 |
+
- **๋ชจ๋ธ**: OuteAI/OuteTTS-0.3-1B
|
170 |
+
- **์ ๋ฐ๋**: ์ต์ ์ฑ๋ฅ์ ์ํ bfloat16
|
171 |
+
- **ํ๋ ์์ํฌ**: CUDA ์ง์ PyTorch
|
172 |
+
- **์ ์ฌ**: ์์ฑ ๋ถ์์ ์ํ Whisper Turbo
|
173 |
+
- **์ถ๋ ฅ ํ์**: WAV ์ค๋์ค ํ์ผ
|
174 |
+
- **GPU ์ต์ ํ**: ์๋ CUDA ๋ฉ๋ชจ๋ฆฌ ๊ด๋ฆฌ
|
175 |
+
- **์ธํฐํ์ด์ค**: ๋ฐ์ํ ๋์์ธ์ Gradio
|
176 |
+
|
177 |
+
### ์์ฑ ๋ณต์ ๋ชจ๋ฒ ์ฌ๋ก
|
178 |
+
|
179 |
+
1. **์ค๋์ค ํ์ง**: ๋ช
ํํ๊ณ ์ก์ ์๋ ๋
น์ ์ฌ์ฉ
|
180 |
+
2. **์ง์ ์๊ฐ**: 7-10์ด ์ํ๋ก ์ต์ ๊ฒฐ๊ณผ
|
181 |
+
3. **์ผ๊ด์ฑ**: ๋ฐฐ๊ฒฝ ์ก์ ์๋ ๋จ์ผ ํ์
|
182 |
+
4. **ํ์**: ์ผ๋ฐ์ ์ธ ์ค๋์ค ํ์ ์ง์
|
183 |
+
5. **์ฝํ
์ธ **: ์์ฐ์ค๋ฌ์ด ์์ฑ ํจํด์ด ๊ฐ์ฅ ํจ๊ณผ์
|
184 |
+
6. **์ธ์ด**: ๋ค๋ฅธ ์ธ์ด ๊ฐ ๋ณต์ ๊ฐ๋ฅ
|
185 |
+
|
186 |
+
### ์์ฑ ๋ณต์ ๋ค๊ตญ์ด TTS๋ฅผ ์ ํํด์ผ ํ๋ ์ด์
|
187 |
+
|
188 |
+
1. **์ ๋ฌธ๊ฐ ํ์ง**: ์คํ๋์ค๊ธ ์์ฑ ํฉ์ฑ
|
189 |
+
2. **๋ค์ํ ์ต์
**: ์ฌ์ ์ค์ ์์ฑ ๋๋ ๋ง์ถค ๋ณต์
|
190 |
+
3. **๋น ๋ฅธ ์ฒ๋ฆฌ**: GPU ๊ฐ์ ์์ฑ
|
191 |
+
4. **์ฌ์ฉ์ ์นํ์ **: ๋ชจ๋ ์ฌ์ฉ์๋ฅผ ์ํ ๊ฐ๋จํ ์ธํฐํ์ด์ค
|
192 |
+
5. **์ ์ฐํ ์ถ๋ ฅ**: ์กฐ์ ๊ฐ๋ฅํ ์์ฑ ํน์ฑ
|
193 |
+
6. **๋ฌด๋ฃ ์ ๊ทผ**: ๊ตฌ๋
๋ฃ๋ ์ฌ์ฉ ์ ํ ์์
|
194 |
+
|
195 |
+
### ๊ธฐ์ ํ์
|
196 |
+
|
197 |
+
- **๊ณ ๊ธ ์ํคํ
์ฒ**: ์ต์ฒจ๋จ TTS ๋ชจ๋ธ
|
198 |
+
- **๋ฉ๋ชจ๋ฆฌ ํจ์จ์ฑ**: ์๋ CUDA ์บ์ ๊ด๋ฆฌ
|
199 |
+
- **์ค๋ฅ ์ฒ๋ฆฌ**: ํด๋ฐฑ์ด ์๋ ๊ฐ๋ ฅํ ์์ฑ
|
200 |
+
- **๋์ ๋ก๋ฉ**: ์จ๋๋งจ๋ ๋ชจ๋ธ ์ด๊ธฐํ
|
201 |
+
- **ํ์ง ๋ณด์ฆ**: ๋ด์ฅ ์ค๋์ค ๊ฒ์ฆ
|
202 |
+
|
203 |
+
### ์์ฐ์ค๋ฌ์ด ์์ฑ ์์ฑ ์์ํ๊ธฐ
|
204 |
+
|
205 |
+
์ ๋ฌธ๊ฐ ํ์ง๋ก ํ
์คํธ๋ฅผ ์์ํ ์์ฑ์ผ๋ก ๋ณํํ์ธ์. ์ฌ์ ์ค์ ์์ฑ์ ์ฌ์ฉํ๋ ๋ง์ถค ์์ฑ์ ๋ณต์ ํ๋ , ์์ฑ ๋ณต์ ๋ค๊ตญ์ด TTS๋ ํ์ํ ์ค๋์ค ์ฝํ
์ธ ์ ์์ ์ํ ๋๊ตฌ๋ฅผ ์ ๊ณตํฉ๋๋ค.
|
206 |
+
|
207 |
+
**์ปค๋ฎค๋ํฐ**: [Discord - Openfree AI](https://discord.gg/openfreeai) | **๋ ๋ง์ AI ๋๊ตฌ**: [OpenFree Best AI Services](https://huggingface.co/spaces/openfree/Best-AI)
|