Hongbing Li commited on
Commit
db9a17a
·
1 Parent(s): 98a6a84

Add model weights and configuration files

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.wav filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,145 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers.js
4
+ language:
5
+ - en
6
+ base_model:
7
+ - hexgrad/Kokoro-82M
8
+ pipeline_tag: text-to-speech
9
+ ---
10
+
11
+ # Kokoro TTS
12
+
13
+ Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out).
14
+
15
+ ## Table of contents
16
+
17
+ - [Usage](#usage)
18
+ - [JavaScript](#javascript)
19
+ - [Python](#python)
20
+ - [Voices/Samples](#voicessamples)
21
+ - [Quantizations](#quantizations)
22
+
23
+
24
+ ## Usage
25
+
26
+ ### JavaScript
27
+
28
+ First, install the `kokoro-js` library from [NPM](https://npmjs.com/package/kokoro-js) using:
29
+ ```bash
30
+ npm i kokoro-js
31
+ ```
32
+
33
+ You can then generate speech as follows:
34
+
35
+ ```js
36
+ import { KokoroTTS } from "kokoro-js";
37
+
38
+ const model_id = "onnx-community/Kokoro-82M-ONNX";
39
+ const tts = await KokoroTTS.from_pretrained(model_id, {
40
+ dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
41
+ });
42
+
43
+ const text = "Life is like a box of chocolates. You never know what you're gonna get.";
44
+ const audio = await tts.generate(text, {
45
+ // Use `tts.list_voices()` to list all available voices
46
+ voice: "af_bella",
47
+ });
48
+ audio.save("audio.wav");
49
+ ```
50
+
51
+
52
+ ### Python
53
+
54
+ ```python
55
+ import os
56
+ import numpy as np
57
+ from onnxruntime import InferenceSession
58
+
59
+ # You can generate token ids as follows:
60
+ # 1. Convert input text to phonemes using https://github.com/hexgrad/misaki
61
+ # 2. Map phonemes to ids using https://huggingface.co/hexgrad/Kokoro-82M/blob/785407d1adfa7ae8fbef8ffd85f34ca127da3039/config.json#L34-L148
62
+ tokens = [50, 157, 43, 135, 16, 53, 135, 46, 16, 43, 102, 16, 56, 156, 57, 135, 6, 16, 102, 62, 61, 16, 70, 56, 16, 138, 56, 156, 72, 56, 61, 85, 123, 83, 44, 83, 54, 16, 53, 65, 156, 86, 61, 62, 131, 83, 56, 4, 16, 54, 156, 43, 102, 53, 16, 156, 72, 61, 53, 102, 112, 16, 70, 56, 16, 138, 56, 44, 156, 76, 158, 123, 56, 16, 62, 131, 156, 43, 102, 54, 46, 16, 102, 48, 16, 81, 47, 102, 54, 16, 54, 156, 51, 158, 46, 16, 70, 16, 92, 156, 135, 46, 16, 54, 156, 43, 102, 48, 4, 16, 81, 47, 102, 16, 50, 156, 72, 64, 83, 56, 62, 16, 156, 51, 158, 64, 83, 56, 16, 44, 157, 102, 56, 16, 44, 156, 76, 158, 123, 56, 4]
63
+
64
+ # Context length is 512, but leave room for the pad token 0 at the start & end
65
+ assert len(tokens) <= 510, len(tokens)
66
+
67
+ # Style vector based on len(tokens), ref_s has shape (1, 256)
68
+ voices = np.fromfile('./voices/af.bin', dtype=np.float32).reshape(-1, 1, 256)
69
+ ref_s = voices[len(tokens)]
70
+
71
+ # Add the pad ids, and reshape tokens, should now have shape (1, <=512)
72
+ tokens = [[0, *tokens, 0]]
73
+
74
+ model_name = 'model.onnx' # Options: model.onnx, model_fp16.onnx, model_quantized.onnx, model_q8f16.onnx, model_uint8.onnx, model_uint8f16.onnx, model_q4.onnx, model_q4f16.onnx
75
+ sess = InferenceSession(os.path.join('onnx', model_name))
76
+
77
+ audio = sess.run(None, dict(
78
+ input_ids=tokens,
79
+ style=ref_s,
80
+ speed=np.ones(1, dtype=np.float32),
81
+ ))[0]
82
+ ```
83
+
84
+ Optionally, save the audio to a file:
85
+ ```py
86
+ import scipy.io.wavfile as wavfile
87
+ wavfile.write('audio.wav', 24000, audio[0])
88
+ ```
89
+
90
+
91
+ ## Voices/Samples
92
+
93
+
94
+ > Life is like a box of chocolates. You never know what you're gonna get.
95
+
96
+
97
+ | Name | Nationality | Gender | Sample |
98
+ | ------------ | ----------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------- |
99
+ | **af_heart** | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/S_9tkA75BT_QHKOzSX6S-.wav"></audio> |
100
+ | af_alloy | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/wiZ3gvlL--p5pRItO4YRE.wav"></audio> |
101
+ | af_aoede | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/Nv1xMwzjTdF9MR8v0oEEJ.wav"></audio> |
102
+ | af_bella | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/sWN0rnKU6TlLsVdGqRktF.wav"></audio> |
103
+ | af_jessica | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/2Oa4wITWAmiCXJ_Q97-7R.wav"></audio> |
104
+ | af_kore | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/AOIgyspzZWDGpn7oQgwtu.wav"></audio> |
105
+ | af_nicole | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/EY_V2OGr-hzmtTGrTCTyf.wav"></audio> |
106
+ | af_nova | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/X-xdEkx3GPlQG5DK8Gsqd.wav"></audio> |
107
+ | af_river | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/ZqaV2-xGUZdBQmZAF1Xqy.wav"></audio> |
108
+ | af_sarah | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/xzoJBl1HCvkE8Fl8Xu2R4.wav"></audio> |
109
+ | af_sky | American | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/ubebYQoaseyQk-jDLeWX7.wav"></audio> |
110
+ | am_adam | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/tvauhDVRGvGK98I-4wv3H.wav"></audio> |
111
+ | am_echo | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/qy_KuUB0hXsu-u8XaJJ_Z.wav"></audio> |
112
+ | am_eric | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/JhqPjbpMhraUv5nTSPpwD.wav"></audio> |
113
+ | am_fenrir | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/c0R9caBdBiNjGUUalI_DQ.wav"></audio> |
114
+ | am_liam | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/DFHvulaLeOjXIDKecvNG3.wav"></audio> |
115
+ | am_michael | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/IPKhsnjq1tPh3JmHH8nEg.wav"></audio> |
116
+ | am_onyx | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/ov0pFDfE8NNKZ80LqW6Di.wav"></audio> |
117
+ | am_puck | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/MOC654sLMHWI64g8HWesV.wav"></audio> |
118
+ | am_santa | American | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/LzA6JmHBvQlhOviy8qVfJ.wav"></audio> |
119
+ | bf_alice | British | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/9mnYZ3JWq7f6U12plXilA.wav"></audio> |
120
+ | bf_emma | British | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/_fvGtKMttRI0cZVGqxMh8.wav"></audio> |
121
+ | bf_isabella | British | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/VzlcJpqGEND_Q3duYnhiu.wav"></audio> |
122
+ | bf_lily | British | Female | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/qZCoartohiRlVamY8Xpok.wav"></audio> |
123
+ | bm_daniel | British | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/Eb0TLnLXHDRYOA3TJQKq3.wav"></audio> |
124
+ | bm_fable | British | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/NT9XkmvlezQ0FJ6Th5hoZ.wav"></audio> |
125
+ | bm_george | British | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/y6VJbCESszLZGupPoqNkF.wav"></audio> |
126
+ | bm_lewis | British | Male | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/RlB5BRvLt-IFvTjzQNxCh.wav"></audio> |
127
+
128
+
129
+ ## Quantizations
130
+
131
+ The model is resilient to quantization, enabling efficient high-quality speech synthesis at a fraction of the original model size.
132
+
133
+ > How could I know? It's an unanswerable question. Like asking an unborn child if they'll lead a good life. They haven't even been born.
134
+
135
+
136
+ | Model | Size (MB) | Sample |
137
+ |------------------------------------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------|
138
+ | model.onnx (fp32) | 326 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/njexBuqPzfYUvWgs9eQ-_.wav"></audio> |
139
+ | model_fp16.onnx (fp16) | 163 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/8Ebl44hMQonZs4MlykExt.wav"></audio> |
140
+ | model_quantized.onnx (8-bit) | 92.4 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/9SLOt6ETclZ4yRdlJ0VIj.wav"></audio> |
141
+ | model_q8f16.onnx (Mixed precision) | 86 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/gNDMqb33YEmYMbAIv_Grx.wav"></audio> |
142
+ | model_uint8.onnx (8-bit & mixed precision) | 177 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/tpOWRHIWwEb0PJX46dCWQ.wav"></audio> |
143
+ | model_uint8f16.onnx (Mixed precision) | 114 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/vtZhABzjP0pvGD7dRb5Vr.wav"></audio> |
144
+ | model_q4.onnx (4-bit matmul) | 305 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/8FVn0IJIUfccEBWq8Fnw_.wav"></audio> |
145
+ | model_q4f16.onnx (4-bit matmul & fp16 weights) | 154 | <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/7DrgWC_1q00s-wUJuG44X.wav"></audio> |
audio.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:226266faab69075bc17eb83d3d8256d0dfa4df25eb6bb323c783c6a4c57e2107
3
+ size 374458
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "model_type": "style_text_to_speech_2"
3
+ }
config_kokoro.json ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "istftnet": {
3
+ "upsample_kernel_sizes": [20, 12],
4
+ "upsample_rates": [10, 6],
5
+ "gen_istft_hop_size": 5,
6
+ "gen_istft_n_fft": 20,
7
+ "resblock_dilation_sizes": [
8
+ [1, 3, 5],
9
+ [1, 3, 5],
10
+ [1, 3, 5]
11
+ ],
12
+ "resblock_kernel_sizes": [3, 7, 11],
13
+ "upsample_initial_channel": 512
14
+ },
15
+ "dim_in": 64,
16
+ "dropout": 0.2,
17
+ "hidden_dim": 512,
18
+ "max_conv_dim": 512,
19
+ "max_dur": 50,
20
+ "multispeaker": true,
21
+ "n_layer": 3,
22
+ "n_mels": 80,
23
+ "n_token": 178,
24
+ "style_dim": 128,
25
+ "text_encoder_kernel_size": 5,
26
+ "plbert": {
27
+ "hidden_size": 768,
28
+ "num_attention_heads": 12,
29
+ "intermediate_size": 2048,
30
+ "max_position_embeddings": 512,
31
+ "num_hidden_layers": 12,
32
+ "dropout": 0.1
33
+ },
34
+ "vocab": {
35
+ ";": 1,
36
+ ":": 2,
37
+ ",": 3,
38
+ ".": 4,
39
+ "!": 5,
40
+ "?": 6,
41
+ "—": 9,
42
+ "…": 10,
43
+ "\"": 11,
44
+ "(": 12,
45
+ ")": 13,
46
+ "“": 14,
47
+ "”": 15,
48
+ " ": 16,
49
+ "\u0303": 17,
50
+ "ʣ": 18,
51
+ "ʥ": 19,
52
+ "ʦ": 20,
53
+ "ʨ": 21,
54
+ "ᵝ": 22,
55
+ "\uAB67": 23,
56
+ "A": 24,
57
+ "I": 25,
58
+ "O": 31,
59
+ "Q": 33,
60
+ "S": 35,
61
+ "T": 36,
62
+ "W": 39,
63
+ "Y": 41,
64
+ "ᵊ": 42,
65
+ "a": 43,
66
+ "b": 44,
67
+ "c": 45,
68
+ "d": 46,
69
+ "e": 47,
70
+ "f": 48,
71
+ "h": 50,
72
+ "i": 51,
73
+ "j": 52,
74
+ "k": 53,
75
+ "l": 54,
76
+ "m": 55,
77
+ "n": 56,
78
+ "o": 57,
79
+ "p": 58,
80
+ "q": 59,
81
+ "r": 60,
82
+ "s": 61,
83
+ "t": 62,
84
+ "u": 63,
85
+ "v": 64,
86
+ "w": 65,
87
+ "x": 66,
88
+ "y": 67,
89
+ "z": 68,
90
+ "ɑ": 69,
91
+ "ɐ": 70,
92
+ "ɒ": 71,
93
+ "æ": 72,
94
+ "β": 75,
95
+ "ɔ": 76,
96
+ "ɕ": 77,
97
+ "ç": 78,
98
+ "ɖ": 80,
99
+ "ð": 81,
100
+ "ʤ": 82,
101
+ "ə": 83,
102
+ "ɚ": 85,
103
+ "ɛ": 86,
104
+ "ɜ": 87,
105
+ "ɟ": 90,
106
+ "ɡ": 92,
107
+ "ɥ": 99,
108
+ "ɨ": 101,
109
+ "ɪ": 102,
110
+ "ʝ": 103,
111
+ "ɯ": 110,
112
+ "ɰ": 111,
113
+ "ŋ": 112,
114
+ "ɳ": 113,
115
+ "ɲ": 114,
116
+ "ɴ": 115,
117
+ "ø": 116,
118
+ "ɸ": 118,
119
+ "θ": 119,
120
+ "œ": 120,
121
+ "ɹ": 123,
122
+ "ɾ": 125,
123
+ "ɻ": 126,
124
+ "ʁ": 128,
125
+ "ɽ": 129,
126
+ "ʂ": 130,
127
+ "ʃ": 131,
128
+ "ʈ": 132,
129
+ "ʧ": 133,
130
+ "ʊ": 135,
131
+ "ʋ": 136,
132
+ "ʌ": 138,
133
+ "ɣ": 139,
134
+ "ɤ": 140,
135
+ "χ": 142,
136
+ "ʎ": 143,
137
+ "ʒ": 147,
138
+ "ʔ": 148,
139
+ "ˈ": 156,
140
+ "ˌ": 157,
141
+ "ː": 158,
142
+ "ʰ": 162,
143
+ "ʲ": 164,
144
+ "↓": 169,
145
+ "→": 171,
146
+ "↗": 172,
147
+ "↘": 173,
148
+ "ᵻ": 177
149
+ }
150
+ }
inference.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import numpy as np
4
+ import scipy.io.wavfile as wavfile
5
+ from onnxruntime import InferenceSession
6
+ from phonemizer import phonemize
7
+
8
+ # === Step 1: Load phoneme-to-ID vocabulary ===
9
+ CONFIG_PATH = "./config_kokoro.json" # Download this from Hugging Face: Kokoro-82M/config.json
10
+ with open(CONFIG_PATH, "r", encoding="utf-8") as f:
11
+ config = json.load(f)
12
+ phoneme_to_id = config["vocab"]
13
+
14
+ # === Step 2: Convert text to phonemes using espeak-ng ===
15
+ text = "Hi how are you, what is your name. tell me something"
16
+
17
+ phonemes = phonemize(
18
+ text,
19
+ language="en-us",
20
+ backend="espeak",
21
+ strip=True,
22
+ preserve_punctuation=True,
23
+ with_stress=True
24
+ )
25
+
26
+ # === Step 3: Filter out unsupported phonemes and convert to token IDs ===
27
+ phonemes = "".join(p for p in phonemes if p in phoneme_to_id)
28
+ print("Phonemes:", phonemes)
29
+
30
+ tokens = [phoneme_to_id[p] for p in phonemes]
31
+ print("Token IDs:", tokens)
32
+
33
+ # === Step 4: Prepare style embedding and input IDs ===
34
+ assert len(tokens) <= 510, "Token sequence too long (max 510 phonemes)"
35
+
36
+ voices = np.fromfile('./voices/af.bin', dtype=np.float32).reshape(-1, 1, 256)
37
+ ref_s = voices[len(tokens)] # Select style vector based on token length
38
+
39
+ tokens = [[0, *tokens, 0]] # Add padding tokens at the beginning and end
40
+
41
+ # === Step 5: Run ONNX model inference ===
42
+ model_name = 'model.onnx'
43
+ sess = InferenceSession(os.path.join('onnx', model_name))
44
+
45
+ audio = sess.run(None, {
46
+ 'input_ids': tokens,
47
+ 'style': ref_s,
48
+ 'speed': np.ones(1, dtype=np.float32),
49
+ })[0]
50
+
51
+ # === Step 6: Save output audio as a 24kHz WAV file ===
52
+ wavfile.write('audio.wav', 24000, audio[0])
53
+ print("✅ Audio saved to audio.wav")
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8fbea51ea711f2af382e88c833d9e288c6dc82ce5e98421ea61c058ce21a34cb
3
+ size 325532232
onnx/model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba4527a874b42b21e35f468c10d326fdff3c7fc8cac1f85e9eb6c0dfc35c334a
3
+ size 163234740
onnx/model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04cf570cf9c4153694f76347ed4b9a48c1b59ff1de0999e6605d123966b197c7
3
+ size 305215966
onnx/model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1a508a6a29671ead84fac99c7401fbd3c21a583fc6ed1406d1ec974d53bf45f
3
+ size 154586422
onnx/model_q8f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04c658aec1b6008857c2ad10f8c589d4180d0ec427e7e6118ceb487e215c3cd0
3
+ size 86033585
onnx/model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fbae9257e1e05ffc727e951ef9b9c98418e6d79f1c9b6b13bd59f5c9028a1478
3
+ size 92361116
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6607a397d77b8514065420b7c1e7320117f7aabfdb45ce15f0050c5b0fe75aea
3
+ size 177464632
onnx/model_uint8f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:883333e03c597584b532eebea0f8310f25f0c9ade58fe864792c12d969944a9a
3
+ size 114209226
requirement.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ phonemizer==3.2.1
2
+ espeakng==1.0.1
3
+ numpy>=1.21
4
+ onnxruntime>=1.16.0
5
+ scipy>=1.7
tokenizer.json ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [],
6
+ "normalizer": {
7
+ "type": "Replace",
8
+ "pattern": {
9
+ "Regex": "[^$;:,.!?\u2014\u2026\"()\u201c\u201d \u0303\u02a3\u02a5\u02a6\u02a8\u1d5d\uab67AIOQSTWY\u1d4aabcdefhijklmnopqrstuvwxyz\u0251\u0250\u0252\u00e6\u03b2\u0254\u0255\u00e7\u0256\u00f0\u02a4\u0259\u025a\u025b\u025c\u025f\u0261\u0265\u0268\u026a\u029d\u026f\u0270\u014b\u0273\u0272\u0274\u00f8\u0278\u03b8\u0153\u0279\u027e\u027b\u0281\u027d\u0282\u0283\u0288\u02a7\u028a\u028b\u028c\u0263\u0264\u03c7\u028e\u0292\u0294\u02c8\u02cc\u02d0\u02b0\u02b2\u2193\u2192\u2197\u2198\u1d7b]"
10
+ },
11
+ "content": ""
12
+ },
13
+ "pre_tokenizer": {
14
+ "type": "Split",
15
+ "pattern": {
16
+ "Regex": ""
17
+ },
18
+ "behavior": "Isolated",
19
+ "invert": false
20
+ },
21
+ "post_processor": {
22
+ "type": "TemplateProcessing",
23
+ "single": [
24
+ {
25
+ "SpecialToken": {
26
+ "id": "$",
27
+ "type_id": 0
28
+ }
29
+ },
30
+ {
31
+ "Sequence": {
32
+ "id": "A",
33
+ "type_id": 0
34
+ }
35
+ },
36
+ {
37
+ "SpecialToken": {
38
+ "id": "$",
39
+ "type_id": 0
40
+ }
41
+ }
42
+ ],
43
+ "special_tokens": {
44
+ "$": {
45
+ "id": "$",
46
+ "ids": [
47
+ 0
48
+ ],
49
+ "tokens": [
50
+ "$"
51
+ ]
52
+ }
53
+ }
54
+ },
55
+ "decoder": null,
56
+ "model": {
57
+ "vocab": {
58
+ "$": 0,
59
+ ";": 1,
60
+ ":": 2,
61
+ ",": 3,
62
+ ".": 4,
63
+ "!": 5,
64
+ "?": 6,
65
+ "\u2014": 9,
66
+ "\u2026": 10,
67
+ "\"": 11,
68
+ "(": 12,
69
+ ")": 13,
70
+ "\u201c": 14,
71
+ "\u201d": 15,
72
+ " ": 16,
73
+ "\u0303": 17,
74
+ "\u02a3": 18,
75
+ "\u02a5": 19,
76
+ "\u02a6": 20,
77
+ "\u02a8": 21,
78
+ "\u1d5d": 22,
79
+ "\uab67": 23,
80
+ "A": 24,
81
+ "I": 25,
82
+ "O": 31,
83
+ "Q": 33,
84
+ "S": 35,
85
+ "T": 36,
86
+ "W": 39,
87
+ "Y": 41,
88
+ "\u1d4a": 42,
89
+ "a": 43,
90
+ "b": 44,
91
+ "c": 45,
92
+ "d": 46,
93
+ "e": 47,
94
+ "f": 48,
95
+ "h": 50,
96
+ "i": 51,
97
+ "j": 52,
98
+ "k": 53,
99
+ "l": 54,
100
+ "m": 55,
101
+ "n": 56,
102
+ "o": 57,
103
+ "p": 58,
104
+ "q": 59,
105
+ "r": 60,
106
+ "s": 61,
107
+ "t": 62,
108
+ "u": 63,
109
+ "v": 64,
110
+ "w": 65,
111
+ "x": 66,
112
+ "y": 67,
113
+ "z": 68,
114
+ "\u0251": 69,
115
+ "\u0250": 70,
116
+ "\u0252": 71,
117
+ "\u00e6": 72,
118
+ "\u03b2": 75,
119
+ "\u0254": 76,
120
+ "\u0255": 77,
121
+ "\u00e7": 78,
122
+ "\u0256": 80,
123
+ "\u00f0": 81,
124
+ "\u02a4": 82,
125
+ "\u0259": 83,
126
+ "\u025a": 85,
127
+ "\u025b": 86,
128
+ "\u025c": 87,
129
+ "\u025f": 90,
130
+ "\u0261": 92,
131
+ "\u0265": 99,
132
+ "\u0268": 101,
133
+ "\u026a": 102,
134
+ "\u029d": 103,
135
+ "\u026f": 110,
136
+ "\u0270": 111,
137
+ "\u014b": 112,
138
+ "\u0273": 113,
139
+ "\u0272": 114,
140
+ "\u0274": 115,
141
+ "\u00f8": 116,
142
+ "\u0278": 118,
143
+ "\u03b8": 119,
144
+ "\u0153": 120,
145
+ "\u0279": 123,
146
+ "\u027e": 125,
147
+ "\u027b": 126,
148
+ "\u0281": 128,
149
+ "\u027d": 129,
150
+ "\u0282": 130,
151
+ "\u0283": 131,
152
+ "\u0288": 132,
153
+ "\u02a7": 133,
154
+ "\u028a": 135,
155
+ "\u028b": 136,
156
+ "\u028c": 138,
157
+ "\u0263": 139,
158
+ "\u0264": 140,
159
+ "\u03c7": 142,
160
+ "\u028e": 143,
161
+ "\u0292": 147,
162
+ "\u0294": 148,
163
+ "\u02c8": 156,
164
+ "\u02cc": 157,
165
+ "\u02d0": 158,
166
+ "\u02b0": 162,
167
+ "\u02b2": 164,
168
+ "\u2193": 169,
169
+ "\u2192": 171,
170
+ "\u2197": 172,
171
+ "\u2198": 173,
172
+ "\u1d7b": 177
173
+ }
174
+ }
175
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "model_max_length": 512,
3
+ "pad_token": "$",
4
+ "tokenizer_class": "PreTrainedTokenizer",
5
+ "unk_token": "$"
6
+ }
voices/af.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4f11d9d055a12bfa0db2668a3e4f0ef8fd1f1ccca69494479718e44dbf9e41a
3
+ size 524288
voices/af_alloy.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4a6b876047fd7fb472edf4ebd63cfac7c3b958a7cae7c106e8f038ca6308c45
3
+ size 522240
voices/af_aoede.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a004c33430762e2461eedb2013fad808ef4ab3121f5300f554476caf58d8361
3
+ size 522240
voices/af_bella.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f69d836209b78eb8c66e75e3cda491e26ea838a3674257e9d4e5703cbaf55c8b
3
+ size 522240
voices/af_heart.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d583ccff3cdca2f7fae535cb998ac07e9fcb90f09737b9a41fa2734ec44a8f0b
3
+ size 522240
voices/af_jessica.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a240a5e3c15b43563d6e923bdca8ef5613a23471d9b77653694012435df23bd8
3
+ size 522240
voices/af_kore.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9be5221b6a941c04b561959b8ff0b06e809444dcc4ab7e75a7b23606f691819e
3
+ size 522240
voices/af_nicole.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd2191ab31b914ed7b318416b0e4440fdf392ddad9106a060819aa600a64f59a
3
+ size 522240
voices/af_nova.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18778272caa0d0eebaea251c35fd635f038434f9eee5e691d02a174bd328414f
3
+ size 522240
voices/af_river.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00a2bcf82b1d86e8f19902ede58c65ccf6c0e43b44b7d74fad54e5d8933c9c30
3
+ size 522240
voices/af_sarah.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4409fbc125afabacc615d94db5398d847006a737b0247d6892b7a9a0007a2f0a
3
+ size 522240
voices/af_sky.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4435255c9744f3f31659e0d714ab7689bf65d9e77ec1cce060f083912614f0b9
3
+ size 522240
voices/am_adam.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:162b035ed91cfc48b6046982184c645f72edcdd1b82843347f605d7bf7b15716
3
+ size 522240
voices/am_echo.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3968b92c3c4cd1c4416dbded36c13eaa388a90d5788d02a13e4d781f5f8cf3c3
3
+ size 522240
voices/am_eric.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8b5be17edd1e3636901ce7598baafe2dc8dd8ff707a0c23bf9e461add7e2832
3
+ size 522240
voices/am_fenrir.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c27989f741f7ee34d273a39d8a595cc0837d35f5ced9a29b7cc162614616df43
3
+ size 522240
voices/am_liam.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52403be32fd047c6a44517cb0bcd6b134f2a18baa73e70ef41651e0eab921ade
3
+ size 522240
voices/am_michael.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d1f21dd8da39c30705cd4c75d039d265e9bc4a2a93ed09bc9e1b1225eb95ba1
3
+ size 522240
voices/am_onyx.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da5d135b424164916d75a68ffb4c2abce3d7d5ccc82dd1ee6cf447ce286145e6
3
+ size 522240
voices/am_puck.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fcf73c989033e9233e0b98713eca600c8c74dcc1614b37009d5450ff4a2274a0
3
+ size 522240
voices/am_santa.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:61150cf726ab6c5ed7a99f90a304f91f5a72c00c592e89ec94e5df11c319227a
3
+ size 522240
voices/bf_alice.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08afa6ba24da61ea5e8efa139e5aadc938d83f0a6da5a900adaf763ac1da5573
3
+ size 522240
voices/bf_emma.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:669fe0647f9dd04fcab92f1439a40eeb4c8b4ab1f82e4996fe3d918ce4a63b73
3
+ size 522240
voices/bf_isabella.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3754352c4aaa46d17f27654ab7518d65b62ad6163a0f55a5f4330c2da2c4e94f
3
+ size 522240
voices/bf_lily.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e0ee32ebe64a467124976b14e69590746f1c4ce41a12b587a50c862edfea335
3
+ size 522240
voices/bm_daniel.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b3194bbceffb746733cbc22c8f593dd44e401a71d53895a2dca891bc595a1e8
3
+ size 522240
voices/bm_fable.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f889083196807b4adb15e9204252165f503b8d33d3982e681c52443c49d798f1
3
+ size 522240
voices/bm_george.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4b235a4c1f2cd3b939fed08b899ce9385638b763f7b73a59616c4fc9bd6c9bc
3
+ size 522240
voices/bm_lewis.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8f671cef828c30e66fdf0b0756a76bba58f6bb3398cbbf27058642acbcedb97
3
+ size 522240
voices/ef_dora.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f66ec66bd295acb18372e37008533a9a3228483ccd294e7538d5d9294ac9a532
3
+ size 522240
voices/em_alex.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27809e9eafdcbcfff90a3016c697568676531de2a2c39cee29c96c7bd6b83e95
3
+ size 522240
voices/em_santa.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad43b774e1ca24d05c6161297d8aeb770ac3d29bb95daf516727af5f7d543683
3
+ size 522240
voices/ff_siwis.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a35f5675ad08948e326ae75fd0ea16ba5d0042e4f76b5f3d1df77d0a48c54861
3
+ size 522240