File size: 4,410 Bytes
cd366ed
 
00cd4fd
 
 
 
cd366ed
00cd4fd
e824e7b
00cd4fd
e824e7b
00cd4fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db40fc1
00cd4fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed6c11c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
00cd4fd
 
 
 
 
95779fc
00cd4fd
95779fc
00cd4fd
95779fc
00cd4fd
95779fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: apache-2.0
language:
- de
library_name: transformers
pipeline_tag: automatic-speech-recognition
---

# distil-whisper-german

This model is a German Speech Recognition model based on the [distil-whisper](https://github.com/huggingface/distil-whisper) technique.
The model weights count 756M parameters and with a size of 1.51GB in bfloat16 format.

As a follow-up to the [Whisper large v3 german](https://huggingface.co/primeline/whisper-large-v3-german) we decided to create a distilled version for a faster inference with minimal quality loss.

## Intended uses & limitations

The model is intended to be used for German speech recognition tasks.
It can be used as local transkription service or as a part of a larger pipeline for speech recognition tasks.
While counting only half of the parameters of the large model, the quality is still very good and can be used for most tasks.
The latency is low enough to be used in real-time applications when using optimization toolkits like tensorrt.

## Dataset

The dataset used for training is a filtered subset of the [Common Voice](https://huggingface.co/datasets/common_voice) dataset, multilingual librispeech and some internal data.
The data was filtered and double checked for quality and correctness.
We did some normalization to the text data, especially for casing and punctuation.


## Model family

| Model                            | Parameters | link                                                         |
|----------------------------------|------------|--------------------------------------------------------------|
| Whisper large v3 german          | 1.54B      | [link](https://huggingface.co/primeline/whisper-large-v3-german) |
| Whisper large v3 turbo german    | 809M       | [link](https://huggingface.co/primeline/whisper-large-v3-turbo-german)
| Distil-whisper large v3 german   | 756M       | [link](https://huggingface.co/primeline/distil-whisper-large-v3-german) |
| tiny whisper                     | 37.8M      | [link](https://huggingface.co/primeline/whisper-tiny-german) |

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-05
- total_train_batch_size: 512
- num_epochs: 5.0

### Framework versions

- Transformers 4.39.3
- Pytorch 2.3.0a0+ebedce2
- Datasets 2.18.0
- Tokenizers 0.15.2


### How to use
```python
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "primeline/distil-whisper-large-v3-german"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
result = pipe(sample)
print(result["text"])
```



## [About us](https://primeline-ai.com/en/)

[![primeline AI](https://primeline-ai.com/wp-content/uploads/2024/02/pl_ai_bildwortmarke_original.svg)](https://primeline-ai.com/en/)


Your partner for AI infrastructure in Germany

Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing. 

Optimized for AI training and inference.



Model author: [Florian Zimmermeister](https://huggingface.co/flozi00)

**Disclaimer**

```
This model is not a product of the primeLine Group. 

It represents research conducted by [Florian Zimmermeister](https://huggingface.co/flozi00), with computing power sponsored by primeLine. 

The model is published under this account by primeLine, but it is not a commercial product of primeLine Solutions GmbH.

Please be aware that while we have tested and developed this model to the best of our abilities, errors may still occur. 

Use of this model is at your own risk. We do not accept liability for any incorrect outputs generated by this model.
```