Spaces:

alethanhson
/

csm-1b-gradio-v2

Running

File size: 3,320 Bytes

d1fc75f
6d75162
 
d1fc75f
 
 
f022652
d1fc75f
 
 
 
6d75162
 
e02c9de
6d75162
e02c9de
6d75162
e02c9de
 
 
6d75162
e02c9de
6d75162
e02c9de
6d75162
e02c9de
6d75162
 
 
 
e02c9de
6d75162
e02c9de
 
 
 
6d75162
 
 
e02c9de
6d75162
e02c9de
6d75162
 
 
 
 
 
 
e02c9de
6d75162
e02c9de
6d75162
 
 
e02c9de
 
 
 
 
6d75162
e02c9de
6d75162
e02c9de
6d75162
 
 
e02c9de
6d75162
 
 
 
 
 
e02c9de
 
6d75162
 
 
e02c9de
6d75162
e02c9de
6d75162
e02c9de
 
 
6d75162
e02c9de
6d75162
e02c9de
6d75162
e02c9de
 
 
 
6d75162
e02c9de
6d75162

---
title: CSM-1B Gradio Demo
emoji: 🎙️
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
---

# CSM-1B Text-to-Speech Demo

This application uses the CSM-1B (Collaborative Speech Model) to convert text to high-quality speech.

## Features

- **Simple Audio Generation**: Convert text to speech with options for speaker ID, duration, temperature, and top-k.
- **Audio Generation with Context**: Provide audio clips and text as context to help the model generate more appropriate speech.
- **GPU Optimization**: Uses Hugging Face Spaces' ZeroGPU to optimize GPU usage.

## Installation and Configuration

### Access Requirements

To use the CSM-1B model, you need access to the following models on Hugging Face:

- [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
- [sesame/csm-1b](https://huggingface.co/sesame/csm-1b)

### Hugging Face Token Configuration

1. Create a Hugging Face account if you don't have one.
2. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens) to create a token.
3. Request access to the models if needed.
4. Set the `HF_TOKEN` environment variable with your token:
   ```bash
   export HF_TOKEN=your_token_here
   ```
5. Or you can enter your token directly in the "Configuration" tab of the application.

### Installation

```bash
git clone https://github.com/yourusername/csm-1b-gradio.git
cd csm-1b-gradio
pip install -r requirements.txt
```

## How to Use

1. Start the application:
   ```bash
   python app.py
   ```
2. Open a web browser and go to the displayed address (usually http://127.0.0.1:7860).
3. Enter the text you want to convert to speech.
4. Choose a speaker ID (from 0-10).
5. Adjust parameters like maximum duration, temperature, and top-k.
6. Click the "Generate Audio" button to create speech.

## About the Model

CSM-1B is an advanced text-to-speech model developed by Sesame AI Labs. This model can generate natural speech from text with various voices.

## ZeroGPU

This application uses Hugging Face Spaces' ZeroGPU to optimize GPU usage. ZeroGPU helps free up GPU memory when not in use, saving resources and improving performance.

```python
import spaces

@spaces.GPU
def my_gpu_function():
    # This function will only use GPU when called
    # and release GPU after completion
    pass
```

When deployed on Hugging Face Spaces, ZeroGPU will automatically manage GPU usage, making the application more efficient.

## Notes

- This model uses watermarking to mark audio generated by AI.
- Audio generation time depends on text length and hardware configuration.
- You need access to the CSM-1B model on Hugging Face to use this application.

## Deployment on Hugging Face Spaces

To deploy this application on Hugging Face Spaces:

1. Create a new Space on Hugging Face with Gradio SDK.
2. Upload all project files.
3. In the Space settings, add the `HF_TOKEN` environment variable with your token.
4. Choose appropriate hardware configuration (GPU recommended).

## Resources

- [GitHub Repository](https://github.com/SesameAILabs/csm-1b)
- [Hugging Face Model](https://huggingface.co/sesame/csm-1b)
- [Hugging Face Space Demo](https://huggingface.co/spaces/sesame/csm-1b)
- [Hugging Face Spaces ZeroGPU](https://huggingface.co/docs/hub/spaces-sdks-docker-zero-gpu)