csm-1b-gradio-v2 / README.md
phucbienvan
update
f022652
---
title: CSM-1B Gradio Demo
emoji: ๐ŸŽ™๏ธ
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
---
# CSM-1B Text-to-Speech Demo
This application uses the CSM-1B (Collaborative Speech Model) to convert text to high-quality speech.
## Features
- **Simple Audio Generation**: Convert text to speech with options for speaker ID, duration, temperature, and top-k.
- **Audio Generation with Context**: Provide audio clips and text as context to help the model generate more appropriate speech.
- **GPU Optimization**: Uses Hugging Face Spaces' ZeroGPU to optimize GPU usage.
## Installation and Configuration
### Access Requirements
To use the CSM-1B model, you need access to the following models on Hugging Face:
- [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
- [sesame/csm-1b](https://huggingface.co/sesame/csm-1b)
### Hugging Face Token Configuration
1. Create a Hugging Face account if you don't have one.
2. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens) to create a token.
3. Request access to the models if needed.
4. Set the `HF_TOKEN` environment variable with your token:
```bash
export HF_TOKEN=your_token_here
```
5. Or you can enter your token directly in the "Configuration" tab of the application.
### Installation
```bash
git clone https://github.com/yourusername/csm-1b-gradio.git
cd csm-1b-gradio
pip install -r requirements.txt
```
## How to Use
1. Start the application:
```bash
python app.py
```
2. Open a web browser and go to the displayed address (usually http://127.0.0.1:7860).
3. Enter the text you want to convert to speech.
4. Choose a speaker ID (from 0-10).
5. Adjust parameters like maximum duration, temperature, and top-k.
6. Click the "Generate Audio" button to create speech.
## About the Model
CSM-1B is an advanced text-to-speech model developed by Sesame AI Labs. This model can generate natural speech from text with various voices.
## ZeroGPU
This application uses Hugging Face Spaces' ZeroGPU to optimize GPU usage. ZeroGPU helps free up GPU memory when not in use, saving resources and improving performance.
```python
import spaces
@spaces.GPU
def my_gpu_function():
# This function will only use GPU when called
# and release GPU after completion
pass
```
When deployed on Hugging Face Spaces, ZeroGPU will automatically manage GPU usage, making the application more efficient.
## Notes
- This model uses watermarking to mark audio generated by AI.
- Audio generation time depends on text length and hardware configuration.
- You need access to the CSM-1B model on Hugging Face to use this application.
## Deployment on Hugging Face Spaces
To deploy this application on Hugging Face Spaces:
1. Create a new Space on Hugging Face with Gradio SDK.
2. Upload all project files.
3. In the Space settings, add the `HF_TOKEN` environment variable with your token.
4. Choose appropriate hardware configuration (GPU recommended).
## Resources
- [GitHub Repository](https://github.com/SesameAILabs/csm-1b)
- [Hugging Face Model](https://huggingface.co/sesame/csm-1b)
- [Hugging Face Space Demo](https://huggingface.co/spaces/sesame/csm-1b)
- [Hugging Face Spaces ZeroGPU](https://huggingface.co/docs/hub/spaces-sdks-docker-zero-gpu)