File size: 3,164 Bytes
d8a173b 7e19f99 ed6eb51 d8a173b bd25172 d8a173b bd25172 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
license: apache-2.0
pipeline_tag: text-to-speech
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
---
# Dia: Open-Weight Text-to-Speech Dialogue Model (1.6B)
**Dia** is a 1.6B parameter open-weight text-to-speech model developed by Nari Labs.
It generates highly realistic *dialogue* directly from transcripts, with support for both spoken and **nonverbal** cues (e.g., `(laughs)`, `(sighs)`), and can be **conditioned on audio** for emotional tone or voice consistency.
Currently, Dia supports **English** and is optimized for GPU inference. This model is designed for research and educational purposes only.
---
## π₯ Try It Out
- π₯οΈ [ZeroGPU demo on Spaces](https://huggingface.co/spaces/nari-labs/Dia-1.6B)
- π [Comparison demos](https://yummy-fir-7a4.notion.site/dia) with ElevenLabs and Sesame CSM-1B
- π§ Try voice remixing and conversations with a larger version β [join the waitlist](https://tally.so/r/meokbo)
- π¬ [Join the community on Discord](https://discord.gg/pgdB5YRe)
---
## π§ Capabilities
- Multispeaker support using `[S1]`, `[S2]`, etc.
- Rich nonverbal cue synthesis: `(laughs)`, `(clears throat)`, `(gasps)`, etc.
- Voice conditioning (via transcript + audio example)
- Outputs high-fidelity `.mp3` files directly from text
Example input:
```text
[S1] Dia is an open weights text-to-dialogue model. [S2] You get full control over scripts and voices. (laughs)
```
---
## π Quickstart
Install via pip:
```bash
pip install git+https://github.com/nari-labs/dia.git
```
Launch the Gradio UI:
```bash
git clone https://github.com/nari-labs/dia.git
cd dia && uv run app.py
```
Or manually set up:
```bash
git clone https://github.com/nari-labs/dia.git
cd dia
python -m venv .venv
source .venv/bin/activate
pip install -e .
python app.py
```
---
## π Python Example
```python
from dia.model import Dia
model = Dia.from_pretrained("nari-labs/Dia-1.6B", compute_dtype="float16")
text = "[S1] Hello! This is Dia. [S2] Nice to meet you. (laughs)"
output = model.generate(text, use_torch_compile=True, verbose=True)
model.save_audio("output.mp3", output)
```
> Coming soon: PyPI package and CLI support
---
## π» Inference Performance (on RTX 4090)
| Precision | Realtime Factor (w/ compile) | w/o Compile | VRAM Usage |
|-----------|------------------------------|-------------|------------|
| bfloat16 | 2.1Γ | 1.5Γ | ~10GB |
| float16 | 2.2Γ | 1.3Γ | ~10GB |
| float32 | 1.0Γ | 0.9Γ | ~13GB |
> CPU support and quantized version coming soon.
---
## β οΈ Ethical Use
This model is for **research and educational use only**. Prohibited uses include:
- Impersonating individuals (e.g., cloning real voices without consent)
- Generating misleading or malicious content
- Illegal or harmful activities
Please use responsibly.
---
## π License
Apache 2.0
See the [LICENSE](https://github.com/nari-labs/dia/blob/main/LICENSE) for details.
---
## π οΈ Roadmap
- π§ Inference speed optimization
- πΎ CPU & quantized model support
- π¦ PyPI + CLI tools |