Summary

This model card provides information about a model based on the tiny whisper architecture that has been trained for speech recognition in German.

Whisper is a powerful speech recognition platform developed by OpenAI.

Applications

This model can be used in various application areas, including

  • Transcription of spoken German language
  • Voice commands and voice control
  • Automatic subtitling for German videos
  • Voice-based search queries in German
  • Dictation functions in word processing programs

Evaluations - Word error rate

+-----------------------------------------+-------+-----------+----------------------------+---------------------+
| Model                                   |   All |   Tuda-De |   multilingual librispeech |   common_voice_19_0 |
+=========================================+=======+===========+============================+=====================+
| openai-whisper-large-v3                 |  3.28 |      7.86 |                       2.85 |                3.46 |
+-----------------------------------------+-------+-----------+----------------------------+---------------------+
| openai-whisper-large-v3-turbo           |  3.64 |      8.20 |                       3.19 |                3.85 |
+-----------------------------------------+-------+-----------+----------------------------+---------------------+
| openai-whisper-medium                   |  5.49 |     11.13 |                       5.04 |                5.53 |
+-----------------------------------------+-------+-----------+----------------------------+---------------------+
| primeline-whisper-tiny-german-1224      |  6.26 |      9.62 |                       4.97 |                8.46 |
+-----------------------------------------+-------+-----------+----------------------------+---------------------+
| openai-whisper-small                    |  9.54 |     15.94 |                       8.77 |               10.15 |
+-----------------------------------------+-------+-----------+----------------------------+---------------------+
| openai-whisper-base                     | 18.75 |     33.58 |                      17.15 |               19.74 |
+-----------------------------------------+-------+-----------+----------------------------+---------------------+
| openai-whisper-tiny                     | 28.80 |     47.33 |                      26.47 |               30.76 |
+-----------------------------------------+-------+-----------+----------------------------+---------------------+
Size Parameters
tiny 39 M
base 74 M
small 244 M
medium 769 M
large 1550 M
large-v2 1550 M

The results are calculated in December 2024 and may change over the time with updates to the eval corpus.

For always the newest results please check the code and dataset page.

The data and code for evaluations are available here

Training data

The training data for this model includes a large amount of spoken German from various sources.

The data was carefully selected and processed to optimize recognition performance.

The dataset size is about 6.000 hours of public, proprietary and synthetic data.

Training process

The training of the model was performed with the following hyperparameters

  • Batch size: 32768
  • Epochs: 48
  • Learning rate: 1e-4
  • Data augmentation: No
  • Optimizer: Ademamix

How to use

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "primeline/whisper-tiny-german-1224"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
result = pipe(sample)
print(result["text"])

About us

primeline AI

Your partner for AI infrastructure in Germany

Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing.

Optimized for AI training and inference.

Model author: Florian Zimmermeister

Disclaimer

This model is not a product of the primeLine Group. 

It represents research conducted by [Florian Zimmermeister](https://huggingface.co/flozi00), with computing power sponsored by primeLine. 

The model is published under this account by primeLine, but it is not a commercial product of primeLine Solutions GmbH.

Please be aware that while we have tested and developed this model to the best of our abilities, errors may still occur. 

Use of this model is at your own risk. We do not accept liability for any incorrect outputs generated by this model.
Downloads last month
141
Safetensors
Model size
37.8M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.