Daemontatox commited on
Commit
4a9dc3c
·
verified ·
1 Parent(s): 35c00c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +197 -114
README.md CHANGED
@@ -1,117 +1,200 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - yl4579/StyleTTS2-LJSpeech
7
- pipeline_tag: text-to-speech
8
  ---
9
- **Kokoro** is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
10
 
11
- <audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M/resolve/main/samples/HEARME.wav" type="audio/wav"></audio>
12
-
13
- 🐈 **GitHub**: https://github.com/hexgrad/kokoro
14
-
15
- 🚀 **Demo**: https://hf.co/spaces/hexgrad/Kokoro-TTS
16
-
17
- > [!NOTE]
18
- > As of April 2025, the market rate of Kokoro served over API is **under $1 per million characters of text input**, or under $0.06 per hour of audio output. (On average, 1000 characters of input is about 1 minute of output.) Sources: [ArtificialAnalysis/Replicate at 65 cents per M chars](https://artificialanalysis.ai/text-to-speech/model-family/kokoro#price) and [DeepInfra at 80 cents per M chars](https://deepinfra.com/hexgrad/Kokoro-82M).
19
- >
20
- > This is an Apache-licensed model, and Kokoro has been deployed in numerous projects and commercial APIs. We welcome the deployment of the model in real use cases.
21
-
22
- > [!CAUTION]
23
- > Fake websites like kokorottsai_com (snapshot: https://archive.ph/nRRnk) and kokorotts_net (snapshot: https://archive.ph/60opa) are likely scams masquerading under the banner of a popular model.
24
- >
25
- > Any website containing "kokoro" in its root domain (e.g. kokorottsai_com, kokorotts_net) is **NOT owned by and NOT affiliated with this model page or its author**, and attempts to imply otherwise are red flags.
26
-
27
- - [Releases](#releases)
28
- - [Usage](#usage)
29
- - [EVAL.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/EVAL.md) ↗️
30
- - [SAMPLES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/SAMPLES.md) ↗️
31
- - [VOICES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) ↗️
32
- - [Model Facts](#model-facts)
33
- - [Training Details](#training-details)
34
- - [Creative Commons Attribution](#creative-commons-attribution)
35
- - [Acknowledgements](#acknowledgements)
36
-
37
- ### Releases
38
-
39
- | Model | Published | Training Data | Langs & Voices | SHA256 |
40
- | ----- | --------- | ------------- | -------------- | ------ |
41
- | **v1.0** | **2025 Jan 27** | **Few hundred hrs** | [**8 & 54**](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) | `496dba11` |
42
- | [v0.19](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19) | 2024 Dec 25 | <100 hrs | 1 & 10 | `3b0c392f` |
43
-
44
- | Training Costs | v0.19 | v1.0 | **Total** |
45
- | -------------- | ----- | ---- | ----- |
46
- | in A100 80GB GPU hours | 500 | 500 | **1000** |
47
- | average hourly rate | $0.80/h | $1.20/h | **$1/h** |
48
- | in USD | $400 | $600 | **$1000** |
49
-
50
- ### Usage
51
- You can run this basic cell on [Google Colab](https://colab.research.google.com/). [Listen to samples](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/SAMPLES.md). For more languages and details, see [Advanced Usage](https://github.com/hexgrad/kokoro?tab=readme-ov-file#advanced-usage).
52
- ```py
53
- !pip install -q kokoro>=0.9.2 soundfile
54
- !apt-get -qq -y install espeak-ng > /dev/null 2>&1
55
- from kokoro import KPipeline
56
- from IPython.display import display, Audio
57
- import soundfile as sf
58
- import torch
59
- pipeline = KPipeline(lang_code='a')
60
- text = '''
61
- [Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
62
- '''
63
- generator = pipeline(text, voice='af_heart')
64
- for i, (gs, ps, audio) in enumerate(generator):
65
- print(i, gs, ps)
66
- display(Audio(data=audio, rate=24000, autoplay=i==0))
67
- sf.write(f'{i}.wav', audio, 24000)
68
- ```
69
- Under the hood, `kokoro` uses [`misaki`](https://pypi.org/project/misaki/), a G2P library at https://github.com/hexgrad/misaki
70
-
71
- ### Model Facts
72
-
73
- **Architecture:**
74
- - StyleTTS 2: https://arxiv.org/abs/2306.07691
75
- - ISTFTNet: https://arxiv.org/abs/2203.02395
76
- - Decoder only: no diffusion, no encoder release
77
-
78
- **Architected by:** Li et al @ https://github.com/yl4579/StyleTTS2
79
-
80
- **Trained by**: `@rzvzn` on Discord
81
-
82
- **Languages:** Multiple
83
-
84
- **Model SHA256 Hash:** `496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4`
85
-
86
- ### Training Details
87
-
88
- **Data:** Kokoro was trained exclusively on **permissive/non-copyrighted audio data** and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:
89
- - Public domain audio
90
- - Audio licensed under Apache, MIT, etc
91
- - Synthetic audio<sup>[1]</sup> generated by closed<sup>[2]</sup> TTS models from large providers<br/>
92
- [1] https://copyright.gov/ai/ai_policy_guidance.pdf<br/>
93
- [2] No synthetic audio from open TTS models or "custom voice clones"
94
-
95
- **Total Dataset Size:** A few hundred hours of audio
96
-
97
- **Total Training Cost:** About $1000 for 1000 hours of A100 80GB vRAM
98
-
99
- ### Creative Commons Attribution
100
-
101
- The following CC BY audio was part of the dataset used to train Kokoro v1.0.
102
-
103
- | Audio Data | Duration Used | License | Added to Training Set After |
104
- | ---------- | ------------- | ------- | --------------------------- |
105
- | [Koniwa](https://github.com/koniwa/koniwa) `tnc` | <1h | [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/deed.ja) | v0.19 / 22 Nov 2024 |
106
- | [SIWIS](https://datashare.ed.ac.uk/handle/10283/2353) | <11h | [CC BY 4.0](https://datashare.ed.ac.uk/bitstream/handle/10283/2353/license_text) | v0.19 / 22 Nov 2024 |
107
-
108
- ### Acknowledgements
109
-
110
- - 🛠️ [@yl4579](https://huggingface.co/yl4579) for architecting StyleTTS 2.
111
- - 🏆 [@Pendrokar](https://huggingface.co/Pendrokar) for adding Kokoro as a contender in the TTS Spaces Arena.
112
- - 📊 Thank you to everyone who contributed synthetic training data.
113
- - ❤️ Special thanks to all compute sponsors.
114
- - 👾 Discord server: https://discord.gg/QuGxSWBfQy
115
- - 🪽 Kokoro is a Japanese word that translates to "heart" or "spirit". It is also the name of an [AI in the Terminator franchise](https://terminator.fandom.com/wiki/Kokoro).
116
-
117
- <img src="https://static0.gamerantimages.com/wordpress/wp-content/uploads/2024/08/terminator-zero-41-1.jpg" width="400" alt="kokoro" />
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
 
 
 
5
  ---
 
6
 
7
+ # Model Card for Model ID
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+ This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
12
+
13
+ ## Model Details
14
+
15
+ ### Model Description
16
+
17
+ <!-- Provide a longer summary of what this model is. -->
18
+
19
+
20
+
21
+ - **Developed by:** [More Information Needed]
22
+ - **Funded by [optional]:** [More Information Needed]
23
+ - **Shared by [optional]:** [More Information Needed]
24
+ - **Model type:** [More Information Needed]
25
+ - **Language(s) (NLP):** [More Information Needed]
26
+ - **License:** [More Information Needed]
27
+ - **Finetuned from model [optional]:** [More Information Needed]
28
+
29
+ ### Model Sources [optional]
30
+
31
+ <!-- Provide the basic links for the model. -->
32
+
33
+ - **Repository:** [More Information Needed]
34
+ - **Paper [optional]:** [More Information Needed]
35
+ - **Demo [optional]:** [More Information Needed]
36
+
37
+ ## Uses
38
+
39
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
40
+
41
+ ### Direct Use
42
+
43
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
44
+
45
+ [More Information Needed]
46
+
47
+ ### Downstream Use [optional]
48
+
49
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
50
+
51
+ [More Information Needed]
52
+
53
+ ### Out-of-Scope Use
54
+
55
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
56
+
57
+ [More Information Needed]
58
+
59
+ ## Bias, Risks, and Limitations
60
+
61
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
62
+
63
+ [More Information Needed]
64
+
65
+ ### Recommendations
66
+
67
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
68
+
69
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
70
+
71
+ ## How to Get Started with the Model
72
+
73
+ Use the code below to get started with the model.
74
+
75
+ [More Information Needed]
76
+
77
+ ## Training Details
78
+
79
+ ### Training Data
80
+
81
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
82
+
83
+ [More Information Needed]
84
+
85
+ ### Training Procedure
86
+
87
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
88
+
89
+ #### Preprocessing [optional]
90
+
91
+ [More Information Needed]
92
+
93
+
94
+ #### Training Hyperparameters
95
+
96
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
97
+
98
+ #### Speeds, Sizes, Times [optional]
99
+
100
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
101
+
102
+ [More Information Needed]
103
+
104
+ ## Evaluation
105
+
106
+ <!-- This section describes the evaluation protocols and provides the results. -->
107
+
108
+ ### Testing Data, Factors & Metrics
109
+
110
+ #### Testing Data
111
+
112
+ <!-- This should link to a Dataset Card if possible. -->
113
+
114
+ [More Information Needed]
115
+
116
+ #### Factors
117
+
118
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
119
+
120
+ [More Information Needed]
121
+
122
+ #### Metrics
123
+
124
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
125
+
126
+ [More Information Needed]
127
+
128
+ ### Results
129
+
130
+ [More Information Needed]
131
+
132
+ #### Summary
133
+
134
+
135
+
136
+ ## Model Examination [optional]
137
+
138
+ <!-- Relevant interpretability work for the model goes here -->
139
+
140
+ [More Information Needed]
141
+
142
+ ## Environmental Impact
143
+
144
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
145
+
146
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
147
+
148
+ - **Hardware Type:** [More Information Needed]
149
+ - **Hours used:** [More Information Needed]
150
+ - **Cloud Provider:** [More Information Needed]
151
+ - **Compute Region:** [More Information Needed]
152
+ - **Carbon Emitted:** [More Information Needed]
153
+
154
+ ## Technical Specifications [optional]
155
+
156
+ ### Model Architecture and Objective
157
+
158
+ [More Information Needed]
159
+
160
+ ### Compute Infrastructure
161
+
162
+ [More Information Needed]
163
+
164
+ #### Hardware
165
+
166
+ [More Information Needed]
167
+
168
+ #### Software
169
+
170
+ [More Information Needed]
171
+
172
+ ## Citation [optional]
173
+
174
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
175
+
176
+ **BibTeX:**
177
+
178
+ [More Information Needed]
179
+
180
+ **APA:**
181
+
182
+ [More Information Needed]
183
+
184
+ ## Glossary [optional]
185
+
186
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
187
+
188
+ [More Information Needed]
189
+
190
+ ## More Information [optional]
191
+
192
+ [More Information Needed]
193
+
194
+ ## Model Card Authors [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Contact
199
+
200
+ [More Information Needed]