podcastify / README.md
eswardivi's picture
Update README.md
5098582 verified

A newer version of the Gradio SDK is available: 5.38.2

Upgrade
metadata
title: Podcasity
emoji: 🌍
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 5.33.1
app_file: app.py
pinned: false
license: mit
short_description: Generate engaging podcast conversations from documents, link
tags:
  - Agents-MCP-Hackathon
  - mcp-server-track

πŸŽ™οΈ Podcast Generator

This project is a Gradio-based web application that generates a podcast-style conversation from a document, a web link, or raw text. It leverages the power of Mistral AI to create a conversational script and generates the corresponding audio.

🎬 Demo

πŸ“Ί View Demo on YouTube:
➑️ https://youtu.be/0UG4-itpqZU

πŸ”Š Sample Audio

🎧 Listen to a sample podcast audio:
➑️ demo_sample.wav

✨ Powered by

This project is made possible by the following amazing technologies:

  • Gradio: For creating the simple and intuitive web interface for the application.
  • Modal: For serverless hosting of the core audio generation API, allowing for scalable and on-demand processing.
  • Mistral AI: For using its powerful language models to generate the podcast script from the input text.
  • Kokoro: For high-quality text-to-speech synthesis.

Architecture

This project has a client-server architecture:

  1. Gradio Frontend (app.py): The main application you run. It provides a user interface to input text, a document, or a link. It then calls the Mistral AI API to generate a podcast script and orchestrates the calls to the audio generation backend.

  2. Modal Backend (modal/app.py): A serverless backend deployed on Modal.

    • It exposes a FastAPI endpoint that takes text and a voice preference.
    • It uses the kokoro library to perform the text-to-speech conversion.
    • This backend is what actually generates the audio files, which are then sent back to the Gradio client.
    • It is configured to use a T4 GPU for faster inference.

πŸš€ Features

  • Multiple Input Sources: Provide a URL to a document (like a PDF), a link to a webpage, or just paste in raw text.
  • AI-Powered Scripting: Uses Mistral AI to transform your input text into a natural-sounding conversation between two hosts.
  • Audio Generation: Creates a downloadable audio file (.wav) of the generated podcast conversation.
  • Simple Web Interface: An easy-to-use interface built with Gradio.

πŸƒβ€β™€οΈ How to Run

  1. Clone the repository:

    git clone https://huggingface.co/spaces/Agents-MCP-Hackathon/podcastify
    cd podcastify
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Set up your API Key: This project requires an API key from Mistral AI. You need to set it as an environment variable.

    export MISTRAL_API_KEY='your-mistral-api-key'
    

    On Windows, you can use:

    $env:MISTRAL_API_KEY='your-mistral-api-key'
    
  4. Run the application:

    python app.py
    

    This will start a local web server, and you can access the application in your browser at the URL provided in the terminal (usually http://127.0.0.1:7860).

πŸ“ Project Structure

  • app.py: The main file containing the Gradio application. It handles the user interface, text processing with Mistral AI, and calls the audio generation API.
  • modal/app.py: The serverless backend function deployed on Modal, responsible for the core text-to-speech generation using kokoro.
  • requirements.txt: Lists all the Python dependencies for the project.