File size: 3,112 Bytes
a2e576b a898ff3 a2e576b a898ff3 a2e576b a898ff3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
base_model: openai/whisper-medium
library_name: transformers
license: apache-2.0
pipeline_tag: automatic-speech-recognition
tags:
- audio
- automatic-speech-recognition
- whisper
- hf-asr-leaderboard
---
<!-- Provide a quick summary of what the model is/does. -->
Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. See our [GitHub repository](https://github.com/efeslab/LiteASR) and [paper](https://arxiv.org/abs/2502.20583) for details.
## Benchmark Results
Following is the average word error rate (WER) evaluated on the [ESB datasets](https://huggingface.co/datasets/hf-audio/esb-datasets-test-only-sorted):
| Model | Average WER (↓) | Encoder Size | Decoder Size |
|-------|----------------|--------------|--------------|
| [whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 22.01 | 7.63M | 29.55M |
| [lite-whisper-tiny-acc](https://huggingface.co/efficient-speech/lite-whisper-tiny-acc) | 22.97 | 7.41M | 29.55M |
| [lite-whisper-tiny](https://huggingface.co/efficient-speech/lite-whisper-tiny) | 23.95 | 7.00M | 29.55M |
| [lite-whisper-tiny-fast](https://huggingface.co/efficient-speech/lite-whisper-tiny-fast) | 27.09 | 6.48M | 29.55M |
| | | | |
| [whisper-base](https://huggingface.co/openai/whisper-base) | 17.67 | 19.82M | 52.00M |
| [lite-whisper-base-acc](https://huggingface.co/efficient-speech/lite-whisper-base-acc) | 19.07 | 18.64M | 52.00M |
| [lite-whisper-base](https://huggingface.co/efficient-speech/lite-whisper-base) | 19.71 | 17.44M | 52.00M |
| [lite-whisper-base-fast](https://huggingface.co/efficient-speech/lite-whisper-base-fast) | 23.05 | 16.07M | 52.00M |
| | | | |
| [whisper-small](https://huggingface.co/openai/whisper-small) | 15.89 | 87.00M | 153.58M |
| [lite-whisper-small-acc](https://huggingface.co/efficient-speech/lite-whisper-small-acc) | 15.37 | 76.99M | 153.58M |
| [lite-whisper-small](https://huggingface.co/efficient-speech/lite-whisper-small) | 14.96 | 70.16M | 153.58M |
| [lite-whisper-small-fast](https://huggingface.co/efficient-speech/lite-whisper-small-fast) | 14.92 | 63.11M | 153.58M |
| | | | |
| [whisper-medium](https://huggingface.co/openai/whisper-medium) | 15.12 | 305.68M | 456.64M |
| [lite-whisper-medium-acc](https://huggingface.co/efficient-speech/lite-whisper-medium-acc) | 13.46 | 269.93M | 456.64M |
| [lite-whisper-medium](https://huggingface.co/efficient-speech/lite-whisper-medium) | 14.50 | 239.99M | 456.64M |
| [lite-whisper-medium-fast](https://huggingface.co/efficient-speech/lite-whisper-medium-fast) | 14.52 | 215.31M | 456.64M |
## Citation
If you use LiteASR in your research, please cite the following paper:
```
@misc{kamahori2025liteasrefficientautomaticspeech,
title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation},
author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
year={2025},
eprint={2502.20583},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.20583},
}
``` |