Spaces:
Running
Running
title: CSM-1B Gradio Demo | |
emoji: ๐๏ธ | |
colorFrom: indigo | |
colorTo: green | |
sdk: gradio | |
sdk_version: 4.44.1 | |
app_file: app.py | |
pinned: false | |
# CSM-1B Text-to-Speech Demo | |
This application uses the CSM-1B (Collaborative Speech Model) to convert text to high-quality speech. | |
## Features | |
- **Simple Audio Generation**: Convert text to speech with options for speaker ID, duration, temperature, and top-k. | |
- **Audio Generation with Context**: Provide audio clips and text as context to help the model generate more appropriate speech. | |
- **GPU Optimization**: Uses Hugging Face Spaces' ZeroGPU to optimize GPU usage. | |
## Installation and Configuration | |
### Access Requirements | |
To use the CSM-1B model, you need access to the following models on Hugging Face: | |
- [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) | |
- [sesame/csm-1b](https://huggingface.co/sesame/csm-1b) | |
### Hugging Face Token Configuration | |
1. Create a Hugging Face account if you don't have one. | |
2. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens) to create a token. | |
3. Request access to the models if needed. | |
4. Set the `HF_TOKEN` environment variable with your token: | |
```bash | |
export HF_TOKEN=your_token_here | |
``` | |
5. Or you can enter your token directly in the "Configuration" tab of the application. | |
### Installation | |
```bash | |
git clone https://github.com/yourusername/csm-1b-gradio.git | |
cd csm-1b-gradio | |
pip install -r requirements.txt | |
``` | |
## How to Use | |
1. Start the application: | |
```bash | |
python app.py | |
``` | |
2. Open a web browser and go to the displayed address (usually http://127.0.0.1:7860). | |
3. Enter the text you want to convert to speech. | |
4. Choose a speaker ID (from 0-10). | |
5. Adjust parameters like maximum duration, temperature, and top-k. | |
6. Click the "Generate Audio" button to create speech. | |
## About the Model | |
CSM-1B is an advanced text-to-speech model developed by Sesame AI Labs. This model can generate natural speech from text with various voices. | |
## ZeroGPU | |
This application uses Hugging Face Spaces' ZeroGPU to optimize GPU usage. ZeroGPU helps free up GPU memory when not in use, saving resources and improving performance. | |
```python | |
import spaces | |
@spaces.GPU | |
def my_gpu_function(): | |
# This function will only use GPU when called | |
# and release GPU after completion | |
pass | |
``` | |
When deployed on Hugging Face Spaces, ZeroGPU will automatically manage GPU usage, making the application more efficient. | |
## Notes | |
- This model uses watermarking to mark audio generated by AI. | |
- Audio generation time depends on text length and hardware configuration. | |
- You need access to the CSM-1B model on Hugging Face to use this application. | |
## Deployment on Hugging Face Spaces | |
To deploy this application on Hugging Face Spaces: | |
1. Create a new Space on Hugging Face with Gradio SDK. | |
2. Upload all project files. | |
3. In the Space settings, add the `HF_TOKEN` environment variable with your token. | |
4. Choose appropriate hardware configuration (GPU recommended). | |
## Resources | |
- [GitHub Repository](https://github.com/SesameAILabs/csm-1b) | |
- [Hugging Face Model](https://huggingface.co/sesame/csm-1b) | |
- [Hugging Face Space Demo](https://huggingface.co/spaces/sesame/csm-1b) | |
- [Hugging Face Spaces ZeroGPU](https://huggingface.co/docs/hub/spaces-sdks-docker-zero-gpu) | |