Spaces:

alethanhson
/

csm-1b-gradio-v2

Running

App Files Files Community

csm-1b-gradio-v2 / README.md

phucbienvan

update

f022652 6 months ago

preview code

raw

history blame contribute delete

3.32 kB

	---
	title: CSM-1B Gradio Demo
	emoji: 🎙️
	colorFrom: indigo
	colorTo: green
	sdk: gradio
	sdk_version: 4.44.1
	app_file: app.py
	pinned: false
	---

	# CSM-1B Text-to-Speech Demo

	This application uses the CSM-1B (Collaborative Speech Model) to convert text to high-quality speech.

	## Features

	- Simple Audio Generation: Convert text to speech with options for speaker ID, duration, temperature, and top-k.
	- Audio Generation with Context: Provide audio clips and text as context to help the model generate more appropriate speech.
	- GPU Optimization: Uses Hugging Face Spaces' ZeroGPU to optimize GPU usage.

	## Installation and Configuration

	### Access Requirements

	To use the CSM-1B model, you need access to the following models on Hugging Face:

	- [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
	- [sesame/csm-1b](https://huggingface.co/sesame/csm-1b)

	### Hugging Face Token Configuration

	1. Create a Hugging Face account if you don't have one.
	2. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens) to create a token.
	3. Request access to the models if needed.
	4. Set the `HF_TOKEN` environment variable with your token:
	```bash
	export HF_TOKEN=your_token_here
	```
	5. Or you can enter your token directly in the "Configuration" tab of the application.

	### Installation

	```bash
	git clone https://github.com/yourusername/csm-1b-gradio.git
	cd csm-1b-gradio
	pip install -r requirements.txt
	```

	## How to Use

	1. Start the application:
	```bash
	python app.py
	```
	2. Open a web browser and go to the displayed address (usually http://127.0.0.1:7860).
	3. Enter the text you want to convert to speech.
	4. Choose a speaker ID (from 0-10).
	5. Adjust parameters like maximum duration, temperature, and top-k.
	6. Click the "Generate Audio" button to create speech.

	## About the Model

	CSM-1B is an advanced text-to-speech model developed by Sesame AI Labs. This model can generate natural speech from text with various voices.

	## ZeroGPU

	This application uses Hugging Face Spaces' ZeroGPU to optimize GPU usage. ZeroGPU helps free up GPU memory when not in use, saving resources and improving performance.

	```python
	import spaces

	@spaces.GPU
	def my_gpu_function():
	# This function will only use GPU when called
	# and release GPU after completion
	pass
	```

	When deployed on Hugging Face Spaces, ZeroGPU will automatically manage GPU usage, making the application more efficient.

	## Notes

	- This model uses watermarking to mark audio generated by AI.
	- Audio generation time depends on text length and hardware configuration.
	- You need access to the CSM-1B model on Hugging Face to use this application.

	## Deployment on Hugging Face Spaces

	To deploy this application on Hugging Face Spaces:

	1. Create a new Space on Hugging Face with Gradio SDK.
	2. Upload all project files.
	3. In the Space settings, add the `HF_TOKEN` environment variable with your token.
	4. Choose appropriate hardware configuration (GPU recommended).

	## Resources

	- [GitHub Repository](https://github.com/SesameAILabs/csm-1b)
	- [Hugging Face Model](https://huggingface.co/sesame/csm-1b)
	- [Hugging Face Space Demo](https://huggingface.co/spaces/sesame/csm-1b)
	- [Hugging Face Spaces ZeroGPU](https://huggingface.co/docs/hub/spaces-sdks-docker-zero-gpu)