Whisper Large v3 Turbo German - Faster Whisper

Overview

This repository contains a high-performance German speech recognition model based on OpenAI's Whisper Large v3 Turbo architecture. The model has been optimized using CTranslate2 for faster inference and reduced memory usage, making it ideal for production deployments.

Original Model

This model is based on the work from primeline/whisper-large-v3-turbo-german and has been converted to CTranslate2 format for optimal performance with faster-whisper.

Model Details

Architecture: Whisper Large v3 Turbo
Language: German (de)
Parameters: 809M
Format: CTranslate2 optimized
License: Apache 2.0

While this model is optimized for German, it can also transcribe multiple languages supported by Whisper Large v3 Turbo, though accuracy may vary depending on the language.

Performance

The model achieves state-of-the-art performance on German speech recognition tasks with a Word Error Rate (WER) of 2.628% on comprehensive test datasets.

Use Cases

This model is designed for various German speech recognition applications:

Real-time Transcription: Live audio transcription for meetings, lectures, and conferences
Media Processing: Automatic subtitle generation for German video content
Voice Assistants: Speech-to-text conversion for voice-controlled applications
Call Center Analytics: Transcription and analysis of customer service calls
Accessibility Tools: Converting spoken German to text for hearing-impaired users
Document Creation: Voice-to-text dictation for content creation

Installation and Usage

Prerequisites

pip install faster-whisper torch

Basic Usage

from faster_whisper import WhisperModel

# Load the model
model = WhisperModel(
    "TheChola/whisper-large-v3-turbo-german-faster-whisper",
    device="cuda",             # Use GPU for speed
    compute_type="float16"     # Use FP16 for efficiency (can change to "int8" for lower memory)
)

# Transcribe audio file
segments, info = model.transcribe("audio.wav", language="de")

# Print results
for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Advanced Usage with Options

from faster_whisper import WhisperModel

# Load the German-optimized Whisper large-v3 turbo model from Hugging Face
model = WhisperModel(
    "TheChola/whisper-large-v3-turbo-german-faster-whisper",
    device="cuda",             # Use GPU for speed
    compute_type="float16"     # Use FP16 for efficiency (can change to "int8" for lower memory)
)

# Transcribe with additional options
segments, info = model.transcribe(
    "audio.wav",
    language="de",
    beam_size=5,
    best_of=5,
    temperature=0.0,
    condition_on_previous_text=False,
    vad_filter=True,
    vad_parameters=dict(min_silence_duration_ms=500)
)

print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")
print(f"Duration: {info.duration:.2f} seconds")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Model Specifications

Input: Audio files (WAV, MP3, FLAC, etc.)
Output: German text transcription with timestamps
Sampling Rate: 16kHz (automatically resampled if needed)
Context Length: 30 seconds per chunk
Supported Audio Formats: All formats supported by FFmpeg

Hardware Requirements

Minimum Requirements

CPU: 4 cores, 8GB RAM
GPU: Optional, but recommended for faster inference

Recommended Requirements

CPU: 8+ cores, 16GB+ RAM
GPU: NVIDIA GPU with 4GB+ VRAM (RTX 3060 or better)
Storage: 2GB free space for model files

Performance Benchmarks

Device	Batch Size	Real-time Factor	Memory Usage
CPU (8 cores)	1	0.3x	2GB
RTX 3060	4	0.1x	4GB
RTX 4080	8	0.05x	6GB

Model Files

This repository contains the following files:

model.bin - Main model weights in CTranslate2 format
config.json - Model configuration
tokenizer.json - Tokenizer configuration
vocab.json - Vocabulary mapping
Additional configuration files for preprocessing and generation

Limitations

Optimized specifically for German language
Performance may vary on non-German accents or dialects
Best results achieved with clear audio (minimal background noise)
Maximum audio chunk length of 30 seconds for optimal performance

License

This model is released under the bigscience-openrail-m License. See the LICENSE file for more details.

Changelog

v1.0.0

Initial release of CTranslate2 optimized model
Support for faster-whisper framework
Optimized for German speech recognition

TheChola
/

whisper-large-v3-turbo-german-faster-whisper