File size: 3,164 Bytes

d8a173b
 
7e19f99
ed6eb51
 
 
d8a173b
bd25172
d8a173b
bd25172

---
license: apache-2.0
pipeline_tag: text-to-speech
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
---
# Dia: Open-Weight Text-to-Speech Dialogue Model (1.6B)

**Dia** is a 1.6B parameter open-weight text-to-speech model developed by Nari Labs.  
It generates highly realistic *dialogue* directly from transcripts, with support for both spoken and **nonverbal** cues (e.g., `(laughs)`, `(sighs)`), and can be **conditioned on audio** for emotional tone or voice consistency.

Currently, Dia supports **English** and is optimized for GPU inference. This model is designed for research and educational purposes only.

---

## 🔥 Try It Out

- 🖥️ [ZeroGPU demo on Spaces](https://huggingface.co/spaces/nari-labs/Dia-1.6B)
- 📊 [Comparison demos](https://yummy-fir-7a4.notion.site/dia) with ElevenLabs and Sesame CSM-1B
- 🎧 Try voice remixing and conversations with a larger version — [join the waitlist](https://tally.so/r/meokbo)
- 💬 [Join the community on Discord](https://discord.gg/pgdB5YRe)

---

## 🧠 Capabilities

- Multispeaker support using `[S1]`, `[S2]`, etc.
- Rich nonverbal cue synthesis: `(laughs)`, `(clears throat)`, `(gasps)`, etc.
- Voice conditioning (via transcript + audio example)
- Outputs high-fidelity `.mp3` files directly from text

Example input:
```text
[S1] Dia is an open weights text-to-dialogue model. [S2] You get full control over scripts and voices. (laughs)
```

---

## 🚀 Quickstart

Install via pip:

```bash
pip install git+https://github.com/nari-labs/dia.git
```

Launch the Gradio UI:
```bash
git clone https://github.com/nari-labs/dia.git
cd dia && uv run app.py
```

Or manually set up:

```bash
git clone https://github.com/nari-labs/dia.git
cd dia
python -m venv .venv
source .venv/bin/activate
pip install -e .
python app.py
```

---

## 🐍 Python Example

```python
from dia.model import Dia

model = Dia.from_pretrained("nari-labs/Dia-1.6B", compute_dtype="float16")

text = "[S1] Hello! This is Dia. [S2] Nice to meet you. (laughs)"
output = model.generate(text, use_torch_compile=True, verbose=True)
model.save_audio("output.mp3", output)
```

> Coming soon: PyPI package and CLI support

---

## 💻 Inference Performance (on RTX 4090)

| Precision | Realtime Factor (w/ compile) | w/o Compile | VRAM Usage |
|-----------|------------------------------|-------------|------------|
| bfloat16  | 2.1×                          | 1.5×        | ~10GB      |
| float16   | 2.2×                          | 1.3×        | ~10GB      |
| float32   | 1.0×                          | 0.9×        | ~13GB      |

> CPU support and quantized version coming soon.

---

## ⚠️ Ethical Use

This model is for **research and educational use only**. Prohibited uses include:

- Impersonating individuals (e.g., cloning real voices without consent)
- Generating misleading or malicious content
- Illegal or harmful activities

Please use responsibly.

---

## 📄 License

Apache 2.0  
See the [LICENSE](https://github.com/nari-labs/dia/blob/main/LICENSE) for details.

---

## 🛠️ Roadmap

- 🔧 Inference speed optimization
- 💾 CPU & quantized model support
- 📦 PyPI + CLI tools