--- title: Podcasity emoji: 🌍 colorFrom: pink colorTo: purple sdk: gradio sdk_version: 5.33.1 app_file: app.py pinned: false license: mit short_description: Generate engaging podcast conversations from documents, link tags: - Agents-MCP-Hackathon - mcp-server-track --- # 🎙️ Podcast Generator This project is a Gradio-based web application that generates a podcast-style conversation from a document, a web link, or raw text. It leverages the power of Mistral AI to create a conversational script and generates the corresponding audio. ## 🎬 Demo 📺 **View Demo on YouTube:** ➡️ [https://youtu.be/0UG4-itpqZU](https://youtu.be/0UG4-itpqZU) --- ## 🔊 Sample Audio 🎧 **Listen to a sample podcast audio:** ➡️ [demo_sample.wav](./demo_sample.wav) ## ✨ Powered by This project is made possible by the following amazing technologies: - **[Gradio](https://www.gradio.app/):** For creating the simple and intuitive web interface for the application. - **[Modal](https://modal.com/):** For serverless hosting of the core audio generation API, allowing for scalable and on-demand processing. - **[Mistral AI](https://mistral.ai/):** For using its powerful language models to generate the podcast script from the input text. - **[Kokoro](https://huggingface.co/hexgrad/Kokoro-82M):** For high-quality text-to-speech synthesis. ## Architecture This project has a client-server architecture: 1. **Gradio Frontend (`app.py`):** The main application you run. It provides a user interface to input text, a document, or a link. It then calls the Mistral AI API to generate a podcast script and orchestrates the calls to the audio generation backend. 2. **Modal Backend (`modal/app.py`):** A serverless backend deployed on Modal. - It exposes a FastAPI endpoint that takes text and a voice preference. - It uses the `kokoro` library to perform the text-to-speech conversion. - This backend is what actually generates the audio files, which are then sent back to the Gradio client. - It is configured to use a T4 GPU for faster inference. ## 🚀 Features - **Multiple Input Sources:** Provide a URL to a document (like a PDF), a link to a webpage, or just paste in raw text. - **AI-Powered Scripting:** Uses Mistral AI to transform your input text into a natural-sounding conversation between two hosts. - **Audio Generation:** Creates a downloadable audio file (`.wav`) of the generated podcast conversation. - **Simple Web Interface:** An easy-to-use interface built with Gradio. ## 🏃‍♀️ How to Run 1. **Clone the repository:** ```bash git clone https://huggingface.co/spaces/Agents-MCP-Hackathon/podcastify cd podcastify ``` 2. **Install dependencies:** ```bash pip install -r requirements.txt ``` 3. **Set up your API Key:** This project requires an API key from Mistral AI. You need to set it as an environment variable. ```bash export MISTRAL_API_KEY='your-mistral-api-key' ``` On Windows, you can use: ```powershell $env:MISTRAL_API_KEY='your-mistral-api-key' ``` 4. **Run the application:** ```bash python app.py ``` This will start a local web server, and you can access the application in your browser at the URL provided in the terminal (usually `http://127.0.0.1:7860`). ## 📁 Project Structure - `app.py`: The main file containing the Gradio application. It handles the user interface, text processing with Mistral AI, and calls the audio generation API. - `modal/app.py`: The serverless backend function deployed on Modal, responsible for the core text-to-speech generation using `kokoro`. - `requirements.txt`: Lists all the Python dependencies for the project.