Spaces:

hasanbasbunar
/

Voxtral

Running

App Files Files Community

Voxtral / README.md

hasanbasbunar

README update

96fe96c 2 months ago

preview code

raw

history blame contribute delete

4.23 kB

	---
	title: Voxtral
	emoji: ⚡
	colorFrom: gray
	colorTo: green
	sdk: gradio
	sdk_version: 5.38.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: Chat and transcribe audio files with AI, powered by Voxtral.
	---
	# Voxtral Pro Interface

	<div align="center">

	![Python](https://img.shields.io/badge/Python-3.9+-blue?logo=python&logoColor=white)
	![Gradio](https://img.shields.io/badge/Gradio-5.37-orange?logo=gradio)
	![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
	<a href="https://huggingface.co/spaces/hasanbasbunar/Voxtral">![Hugging Face Spaces](https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-yellow)</a>

	</div>

	<p align="center">
	An advanced, feature-rich Gradio UI to explore the full power of Mistral AI's multimodal model, `voxtral`.
	</p>

	<p align="center">
	<img src="image.png" alt="Voxtral Pro Demo" width="80%">
	</p>

	<p align="center">
	<img src="image-1.png" alt="Voxtral Pro Demo" width="80%">
	</p>

	## 🚀 About The Project

	Voxtral Pro was created to explore and showcase the full range of capabilities of Mistral AI's powerful multimodal model, `voxtral`. This application goes beyond a simple chat interface to provide a comprehensive toolkit for interacting with audio and text, demonstrating features like high-quality transcription, multi-turn multimodal conversation, and agent-like tool use.

	This project serves as a practical example of how to build robust, user-friendly, and production-ready applications on top of state-of-the-art foundation models.

	## ✨ Key Features

	* 🎙️ High-Quality Transcription: Transcribe large audio files with exceptional accuracy using the Mistral API.
	* 📄 SRT Subtitle Generation: Automatically generate and export `.srt` subtitle files with precise segment timestamps, perfect for content creators.
	* 💬 Multimodal Chat: Engage in rich, multi-turn conversations combining both text and audio inputs simultaneously.
	* 🤖 Tool Use / Function Calling: Demonstrates the model's ability to call external functions to retrieve information (e.g., getting city data), showcasing its agent-like capabilities.
	* 🔐 Secure API Key Handling: Your Mistral API key is stored securely in your browser's session storage and is never exposed or saved elsewhere.
	* 🎨 Modern UI: A clean, responsive, and aesthetically pleasing interface built with Gradio.

	## 🛠️ Tech Stack

	This project is built with a modern, asynchronous Python stack:

	* Backend: [Python](https://www.python.org/)
	* Web Framework: [Gradio](https://www.gradio.app/)
	* API Client: [httpx](https://www.python-httpx.org/) with `asyncio` for non-blocking API calls.
	* Deployment: [Hugging Face Spaces](https://huggingface.co/spaces)

	## 🏁 Getting Started

	Follow these instructions to get a local copy up and running.

	### Prerequisites

	* Python 3.9+
	* Git

	### Installation & Configuration

	1. Clone the repository:

	git clone [https://huggingface.co/spaces/hasanbasbunar/Voxtral](https://huggingface.co/spaces/hasanbasbunar/Voxtral) && cd Voxtral


	2. Create and activate a virtual environment:
	```sh
	python3 -m venv .venv
	source .venv/bin/activate
	```

	3. Install dependencies:
	```sh
	pip install -r requirements.txt
	```

	4. Configure your API Key:
	Create a file named `.env` in the root of the project and add your Mistral API key:
	```
	MISTRAL_API_KEY="your_api_key_here"
	```
	The application is also designed to let you enter the key directly in the UI if you prefer not to use an `.env` file.

	### Running the Application

	1. Launch the app:
	```sh
	python app.py
	```
	2. Open your browser and navigate to `http://127.0.0.1:7860`.

	## 🚢 Deployment

	This app is designed to be easily deployed. It is currently live on [Hugging Face Spaces](https://huggingface.co/spaces/hasanbasbunar/Voxtral).

	To deploy your own version, you can use any platform that supports Python applications. For a production environment, ensure `debug=False` in `app.py`.

	Example for platforms that use a `PORT` environment variable:
	```python
	# in app.py
	demo.launch(server_port=int(os.environ.get("PORT", 7860)), debug=False)