ContentVoiceGen / README.md
Musabbirkm's picture
Update README.md
d075f97 verified

A newer version of the Gradio SDK is available: 5.29.0

Upgrade
metadata
license: apache-2.0
title: 'AI VoiceCraft: Text-to-Speech Studio'
sdk: gradio
emoji: πŸš€
colorFrom: blue
colorTo: green
pinned: true
short_description: A powerful web app to generate dynamic text content and conv

πŸŽ™οΈ AI VoiceCraft: Text-to-Speech Studio πŸš€

Overview

AI VoiceCraft is a powerful web application built with Gradio that leverages cutting-edge AI to generate dynamic text content and transform it into natural-sounding speech. This tool integrates the Gemini AI model for content generation and Microsoft Edge TTS for high-quality audio synthesis.

Features

  • Dynamic Content Generation:
    •  Generate various content types, including stories, news, podcasts, and more.
      
    •  Customize content length, theme, and style.
      
    •  Utilize Gemini AI for creative and contextually relevant text output.
      
  • High-Quality Text-to-Speech:
    •  Leverage Microsoft Edge TTS for realistic voice synthesis.
      
    •  Support for multiple languages and voices.
      
    •  Fine-tune speech rate and pitch for optimal delivery.
      
  • User-Friendly Interface:
    •  Intuitive Gradio interface for easy navigation and control.
      
    •  Real-time feedback and error handling.
      
    • Attractive theme applied for better user experience.
  • Customization Options:
    •  Adjust the creativity level of the AI content generation.
      
    • Input custom prompts for fine-tuning the AI outputs.
    • Adjust speech rate and pitch to fit your needs.

Getting Started

Prerequisites

  •  Python 3.7+
    
  •  Internet connection (for API access and TTS)
    
  • API Key for Gemini Model.

Installation

  1. Clone the repository:

    git clone https://github.com/musabbirkm/ContentVoiceGen.git
    cd ContentVoiceGen
    
  2. Install the required Python packages:

    pip install gradio requests edge-tts google-generativeai nest_asyncio
    
  3. Set your API key in the VOCALIS.py file.

  4. Run the application:

    python app.py
    
  5. Open your web browser and navigate to the local URL provided by Gradio (usually http://127.0.0.1:7860).

Usage

  1. Select the desired content type from the dropdown menu.
  2. Choose the language and voice for the TTS output.
  3. Adjust the output style, content length, and theme as needed.
  4. Enter any custom text or instructions in the customization field.
  5. Adjust the speech rate and pitch using the sliders.
  6. Click the "Submit" button to generate the text and audio.
  7. Review the generated text and listen to the audio output.

Code Structure

  • your_script_name.py: Main application script that integrates Gradio, content generation, and TTS.
  • VOCALIS.py: Contains the Agent and ContentGenerator classes for AI content generation.
  • edgeTTsLang.py: Dictionary containing the language and voice codes for Microsoft Edge TTS.

Dependencies

  • gradio: For building the web interface.
  • requests: For making HTTP requests to the API.
  • edge-tts: For text-to-speech conversion.
  • google-generativeai: For interacting with the Gemini AI model.
  • asyncio: For asynchronous operations.
  • nest_asyncio: For handling nested asyncio events in Jupyter notebooks.

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bug fixes, feature requests, or improvements.

License

This project is licensed under the Apache Version 2.0 Apache 2.0

To enhance the user experience, an attractive theme has been applied to the Gradio interface. You can customize the theme further by modifying the Gradio theme settings in the create_demo function.