feat: update report link
Browse files- index.html +73 -38
- style.css +1 -0
index.html
CHANGED
|
@@ -28,7 +28,7 @@
|
|
| 28 |
Encoder</h4>
|
| 29 |
<p class="author">
|
| 30 |
MiniMax Team <span class="date">May 2025</span><br />
|
| 31 |
-
<a style="font-size: 1.1rem;"
|
| 32 |
href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report/blob/main/MiniMax_Speech.pdf">[Tech
|
| 33 |
Report]</a>
|
| 34 |
</p>
|
|
@@ -58,7 +58,9 @@
|
|
| 58 |
via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
|
| 59 |
voice
|
| 60 |
cloning (PVC) by fine-tuning timbre features with additional data. We encourage readers to visit
|
| 61 |
-
<a
|
|
|
|
|
|
|
| 62 |
</p>
|
| 63 |
</div>
|
| 64 |
|
|
@@ -73,7 +75,6 @@
|
|
| 73 |
<ol>
|
| 74 |
<li><a href="#showcase-with-high-versatility">Showcase with High Versatility</a></li>
|
| 75 |
<li><a href="#showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts</a></li>
|
| 76 |
-
<li><a href="#examples-with-more-possibilities">Examples with More Possibilities</a></li>
|
| 77 |
</ol>
|
| 78 |
</li>
|
| 79 |
<li><a href="#zero-shot-vs-one-shot-demonstrations">Zero-Shot vs. One-Shot Demonstrations</a></li>
|
|
@@ -158,41 +159,45 @@
|
|
| 158 |
<audio class="audio-md" src="assets/audios/Warm%20and%20Magnetic.mp3" controls></audio>
|
| 159 |
</td>
|
| 160 |
</tr>
|
| 161 |
-
</tbody>
|
| 162 |
-
</table>
|
| 163 |
-
</div>
|
| 164 |
-
|
| 165 |
-
<h3 id="showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts, Post-Processing
|
| 166 |
-
Audio Effects and Added Sound Effects</h3>
|
| 167 |
-
<div class="scroll-wrapper">
|
| 168 |
-
<table style="width: 100%;">
|
| 169 |
-
<tbody>
|
| 170 |
<tr class="border-bottom-thin">
|
| 171 |
-
<
|
| 172 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
</tr>
|
| 174 |
<tr class="border-bottom-thin">
|
| 175 |
<td>
|
| 176 |
-
A
|
| 177 |
</td>
|
| 178 |
<td>
|
| 179 |
-
<audio class="audio-
|
|
|
|
|
|
|
|
|
|
| 180 |
</td>
|
| 181 |
</tr>
|
| 182 |
<tr class="border-bottom-thin">
|
| 183 |
<td>
|
| 184 |
-
|
| 185 |
</td>
|
| 186 |
<td>
|
| 187 |
-
<audio class="audio-
|
|
|
|
|
|
|
|
|
|
| 188 |
</td>
|
| 189 |
</tr>
|
| 190 |
</tbody>
|
| 191 |
</table>
|
| 192 |
</div>
|
| 193 |
|
| 194 |
-
<h3 id="
|
| 195 |
-
|
| 196 |
<div class="scroll-wrapper">
|
| 197 |
<table style="width: 100%;">
|
| 198 |
<tbody>
|
|
@@ -202,26 +207,18 @@
|
|
| 202 |
</tr>
|
| 203 |
<tr class="border-bottom-thin">
|
| 204 |
<td>
|
| 205 |
-
|
| 206 |
-
</td>
|
| 207 |
-
<td>
|
| 208 |
-
<audio class="audio-lg" src="assets/audios/Breathy%20ASMR.MP3" controls></audio>
|
| 209 |
-
</td>
|
| 210 |
-
</tr>
|
| 211 |
-
<tr class="border-bottom-thin">
|
| 212 |
-
<td>
|
| 213 |
-
A Robotic Voice with Rich Bass Resonance and Spatial Presence
|
| 214 |
</td>
|
| 215 |
<td>
|
| 216 |
-
<audio class="audio-lg" src="assets/audios/
|
| 217 |
</td>
|
| 218 |
</tr>
|
| 219 |
<tr class="border-bottom-thin">
|
| 220 |
<td>
|
| 221 |
-
|
| 222 |
</td>
|
| 223 |
<td>
|
| 224 |
-
<audio class="audio-lg" src="assets/audios/
|
| 225 |
</td>
|
| 226 |
</tr>
|
| 227 |
</tbody>
|
|
@@ -885,8 +882,8 @@
|
|
| 885 |
</tr>
|
| 886 |
<tr class="border-bottom-thin">
|
| 887 |
<td>
|
| 888 |
-
|
| 889 |
-
|
| 890 |
在深度访谈场景中表现出专业性和亲和力,音质清晰,吐字规整有力。
|
| 891 |
</td>
|
| 892 |
<td>
|
|
@@ -901,9 +898,9 @@
|
|
| 901 |
</tr>
|
| 902 |
<tr class="border-bottom-thin">
|
| 903 |
<td>
|
| 904 |
-
|
| 905 |
-
|
| 906 |
-
|
| 907 |
</td>
|
| 908 |
<td>
|
| 909 |
亲爱的宝宝们,等了好久的神仙面霜终于到货啦!<br>
|
|
@@ -929,7 +926,7 @@
|
|
| 929 |
<audio class="audio-md" src="assets/audios/体育解说男青年.wav" controls></audio>
|
| 930 |
</td>
|
| 931 |
</tr>
|
| 932 |
-
<tr>
|
| 933 |
<td>
|
| 934 |
中国女青年的声音,音色清脆,说话速度偏快,语调活泼,<br>
|
| 935 |
像是在做游戏直播,声音中带着愉快的感觉,整体音调较高,<br>
|
|
@@ -944,6 +941,44 @@
|
|
| 944 |
<audio class="audio-md" src="assets/audios/游戏主播女青年.wav" controls></audio>
|
| 945 |
</td>
|
| 946 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 947 |
</tbody>
|
| 948 |
</table>
|
| 949 |
</div>
|
|
|
|
| 28 |
Encoder</h4>
|
| 29 |
<p class="author">
|
| 30 |
MiniMax Team <span class="date">May 2025</span><br />
|
| 31 |
+
<a style="font-size: 1.1rem;" target="_blank"
|
| 32 |
href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report/blob/main/MiniMax_Speech.pdf">[Tech
|
| 33 |
Report]</a>
|
| 34 |
</p>
|
|
|
|
| 58 |
via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
|
| 59 |
voice
|
| 60 |
cloning (PVC) by fine-tuning timbre features with additional data. We encourage readers to visit
|
| 61 |
+
<a
|
| 62 |
+
href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report">https://minimax-ai.github.io/tts_tech_report</a>
|
| 63 |
+
for more examples.
|
| 64 |
</p>
|
| 65 |
</div>
|
| 66 |
|
|
|
|
| 75 |
<ol>
|
| 76 |
<li><a href="#showcase-with-high-versatility">Showcase with High Versatility</a></li>
|
| 77 |
<li><a href="#showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts</a></li>
|
|
|
|
| 78 |
</ol>
|
| 79 |
</li>
|
| 80 |
<li><a href="#zero-shot-vs-one-shot-demonstrations">Zero-Shot vs. One-Shot Demonstrations</a></li>
|
|
|
|
| 159 |
<audio class="audio-md" src="assets/audios/Warm%20and%20Magnetic.mp3" controls></audio>
|
| 160 |
</td>
|
| 161 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
<tr class="border-bottom-thin">
|
| 163 |
+
<td>
|
| 164 |
+
An ASMR Whispering Voice with Generated Breathing and Sound Effects
|
| 165 |
+
</td>
|
| 166 |
+
<td>
|
| 167 |
+
<audio class="audio-md" src="assets/audios/Breathy%20ASMR_Sourse.wav" controls></audio>
|
| 168 |
+
</td>
|
| 169 |
+
<td>
|
| 170 |
+
<audio class="audio-md" src="assets/audios/Breathy%20ASMR.MP3" controls></audio>
|
| 171 |
+
</td>
|
| 172 |
</tr>
|
| 173 |
<tr class="border-bottom-thin">
|
| 174 |
<td>
|
| 175 |
+
A Robotic Voice with Rich Bass Resonance and Spatial Presence
|
| 176 |
</td>
|
| 177 |
<td>
|
| 178 |
+
<audio class="audio-md" src="assets/audios/Lucky%20Robot_Sourse.wav" controls></audio>
|
| 179 |
+
</td>
|
| 180 |
+
<td>
|
| 181 |
+
<audio class="audio-md" src="assets/audios/Lucky%20Robot.mp3" controls></audio>
|
| 182 |
</td>
|
| 183 |
</tr>
|
| 184 |
<tr class="border-bottom-thin">
|
| 185 |
<td>
|
| 186 |
+
A Sardonic Mature Female Voice
|
| 187 |
</td>
|
| 188 |
<td>
|
| 189 |
+
<audio class="audio-md" src="assets/audios/Onee-san_Sourse.wav" controls></audio>
|
| 190 |
+
</td>
|
| 191 |
+
<td>
|
| 192 |
+
<audio class="audio-md" src="assets/audios/Onee-san.wav" controls></audio>
|
| 193 |
</td>
|
| 194 |
</tr>
|
| 195 |
</tbody>
|
| 196 |
</table>
|
| 197 |
</div>
|
| 198 |
|
| 199 |
+
<h3 id="showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts, Post-Processing
|
| 200 |
+
Audio Effects and Added Sound Effects</h3>
|
| 201 |
<div class="scroll-wrapper">
|
| 202 |
<table style="width: 100%;">
|
| 203 |
<tbody>
|
|
|
|
| 207 |
</tr>
|
| 208 |
<tr class="border-bottom-thin">
|
| 209 |
<td>
|
| 210 |
+
A Husky Male Voice: From Soft Murmur to Excitement to Anger, then to Whispers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 211 |
</td>
|
| 212 |
<td>
|
| 213 |
+
<audio class="audio-lg" src="assets/audios/Murmur-Excitement-Anger-%20Whispers.MP3" controls></audio>
|
| 214 |
</td>
|
| 215 |
</tr>
|
| 216 |
<tr class="border-bottom-thin">
|
| 217 |
<td>
|
| 218 |
+
An Angry Female Voice: From Soft Murmur to Rage to Reminiscence, then to Weeping
|
| 219 |
</td>
|
| 220 |
<td>
|
| 221 |
+
<audio class="audio-lg" src="assets/audios/Neutral-Rage-Reminiscence-Weeping.MP3" controls></audio>
|
| 222 |
</td>
|
| 223 |
</tr>
|
| 224 |
</tbody>
|
|
|
|
| 882 |
</tr>
|
| 883 |
<tr class="border-bottom-thin">
|
| 884 |
<td>
|
| 885 |
+
男性中年声音,说中文,音色浑厚醇厚,带有自然的磁性,<br>
|
| 886 |
+
语速偏慢,音量适中,音调偏低沉。声音整体给人沉稳可靠的感觉,<br>
|
| 887 |
在深度访谈场景中表现出专业性和亲和力,音质清晰,吐字规整有力。
|
| 888 |
</td>
|
| 889 |
<td>
|
|
|
|
| 898 |
</tr>
|
| 899 |
<tr class="border-bottom-thin">
|
| 900 |
<td>
|
| 901 |
+
说中文的女青年,音色偏甜美,语速比较快,<br>
|
| 902 |
+
说话时带着一种轻快的感觉,整体音调较高,像是在直播带货,<br>
|
| 903 |
+
整体氛围比较活跃,声音清晰,听起来很有亲和力。
|
| 904 |
</td>
|
| 905 |
<td>
|
| 906 |
亲爱的宝宝们,等了好久的神仙面霜终于到货啦!<br>
|
|
|
|
| 926 |
<audio class="audio-md" src="assets/audios/体育解说男青年.wav" controls></audio>
|
| 927 |
</td>
|
| 928 |
</tr>
|
| 929 |
+
<tr class="border-bottom-thin">
|
| 930 |
<td>
|
| 931 |
中国女青年的声音,音色清脆,说话速度偏快,语调活泼,<br>
|
| 932 |
像是在做游戏直播,声音中带着愉快的感觉,整体音调较高,<br>
|
|
|
|
| 941 |
<audio class="audio-md" src="assets/audios/游戏主播女青年.wav" controls></audio>
|
| 942 |
</td>
|
| 943 |
</tr>
|
| 944 |
+
<tr class="border-bottom-thin">
|
| 945 |
+
<td>
|
| 946 |
+
English-speaking female voice, sounding relatively young,<br>
|
| 947 |
+
with a sweet and pleasant tone. Speaking at a moderate pace<br>
|
| 948 |
+
with a touch of energy, similar to someone narrating a<br>
|
| 949 |
+
beauty/makeup tutorial video. The overall atmosphere is<br>
|
| 950 |
+
relaxed and cheerful.
|
| 951 |
+
</td>
|
| 952 |
+
<td>
|
| 953 |
+
Hi everyone! Today I'll be sharing a soft, romantic<br>
|
| 954 |
+
makeup look that's perfect for dates. Many of you have <br>
|
| 955 |
+
been asking how to apply this eyeshadow naturally - the<br>
|
| 956 |
+
key is using gentle techniques. Let's go through the<br>
|
| 957 |
+
steps together...
|
| 958 |
+
</td>
|
| 959 |
+
<td>
|
| 960 |
+
<audio class="audio-md" src="assets/audios/美妆女博主.wav" controls></audio>
|
| 961 |
+
</td>
|
| 962 |
+
</tr>
|
| 963 |
+
<tr>
|
| 964 |
+
<td>
|
| 965 |
+
English-speaking middle-aged male voice, slightly husky, <br>
|
| 966 |
+
speaking at a moderate-to-slow pace with a deep tone. Like<br>
|
| 967 |
+
someone telling an old story, conveying a nostalgic feeling,<br>
|
| 968 |
+
with a relaxed and composed manner of speaking.
|
| 969 |
+
</td>
|
| 970 |
+
<td>
|
| 971 |
+
That was back in the late 1970s. I remember when our <br>
|
| 972 |
+
village first got electricity - everyone was so excited. <br>
|
| 973 |
+
In theevenings, people would bring their stools and <br>
|
| 974 |
+
gather under the big banyan tree by the village committee <br>
|
| 975 |
+
office to watch movies projected on the wall. Even now, <br>
|
| 976 |
+
thinking back to those moments still fills me with warmth.
|
| 977 |
+
</td>
|
| 978 |
+
<td>
|
| 979 |
+
<audio class="audio-md" src="assets/audios/回忆男中年.wav" controls></audio>
|
| 980 |
+
</td>
|
| 981 |
+
</tr>
|
| 982 |
</tbody>
|
| 983 |
</table>
|
| 984 |
</div>
|
style.css
CHANGED
|
@@ -837,5 +837,6 @@ h3,
|
|
| 837 |
h4,
|
| 838 |
h5,
|
| 839 |
h6 {
|
|
|
|
| 840 |
margin-bottom: 1rem;
|
| 841 |
}
|
|
|
|
| 837 |
h4,
|
| 838 |
h5,
|
| 839 |
h6 {
|
| 840 |
+
text-align: left;
|
| 841 |
margin-bottom: 1rem;
|
| 842 |
}
|