Tijs Zwinkels
commited on
Commit
·
6ec1f65
1
Parent(s):
f412812
Update documentation to include openai-api backend
Browse files
README.md
CHANGED
|
@@ -31,14 +31,19 @@ Please, cite us. [Bibtex citation](http://www.afnlp.org/conferences/ijcnlp2023/p
|
|
| 31 |
|
| 32 |
## Installation
|
| 33 |
|
| 34 |
-
1) ``pip install librosa`` -- audio processing library
|
| 35 |
|
| 36 |
2) Whisper backend.
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
Alternative, less restrictive, but slower backend is [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped): `pip install git+https://github.com/linto-ai/whisper-timestamped`
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
The backend is loaded only when chosen. The unused one does not have to be installed.
|
| 43 |
|
| 44 |
3) Optional, not recommended: sentence segmenter (aka sentence tokenizer)
|
|
@@ -69,7 +74,7 @@ In case of installation issues of opus-fast-mosestokenizer, especially on Window
|
|
| 69 |
|
| 70 |
```
|
| 71 |
usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}]
|
| 72 |
-
[--backend {faster-whisper,whisper_timestamped}] [--vad] [--buffer_trimming {sentence,segment}] [--buffer_trimming_sec BUFFER_TRIMMING_SEC] [--start_at START_AT] [--offline] [--comp_unaware]
|
| 73 |
audio_path
|
| 74 |
|
| 75 |
positional arguments:
|
|
@@ -89,7 +94,7 @@ options:
|
|
| 89 |
Source language code, e.g. en,de,cs, or 'auto' for language detection.
|
| 90 |
--task {transcribe,translate}
|
| 91 |
Transcribe or translate.
|
| 92 |
-
--backend {faster-whisper,whisper_timestamped}
|
| 93 |
Load only this backend for Whisper processing.
|
| 94 |
--vad Use VAD = voice activity detection, with the default parameters.
|
| 95 |
--buffer_trimming {sentence,segment}
|
|
|
|
| 31 |
|
| 32 |
## Installation
|
| 33 |
|
| 34 |
+
1) ``pip install librosa soundfile`` -- audio processing library
|
| 35 |
|
| 36 |
2) Whisper backend.
|
| 37 |
|
| 38 |
+
Several alternative backends are integrated. The most recommended one is [faster-whisper](https://github.com/guillaumekln/faster-whisper) with GPU support. Follow their instructions for NVIDIA libraries -- we succeeded with CUDNN 8.5.0 and CUDA 11.7. Install with `pip install faster-whisper`.
|
| 39 |
|
| 40 |
Alternative, less restrictive, but slower backend is [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped): `pip install git+https://github.com/linto-ai/whisper-timestamped`
|
| 41 |
|
| 42 |
+
Thirdly, it's also possible to run this software from the [OpenAI Whisper API](https://platform.openai.com/docs/api-reference/audio/createTranscription). This solution is fast and requires no GPU, just a small VM will suffice, but you will need to pay OpenAI for api access. Also note that, since each audio fragment is processed multiple times, the [price](https://openai.com/pricing) will be higher than obvious from the pricing page, so keep an eye on costs while using. Setting a higher chunk-size will reduce costs significantly.
|
| 43 |
+
Install with: `pip install openai`
|
| 44 |
+
|
| 45 |
+
For running with the openai-api backend, make sure that your [OpenAI api key](https://platform.openai.com/api-keys) is set in the `OPENAI_API_KEY` environment variable. For example, before running, do: `export OPENAI_API_KEY=sk-xxx` with *sk-xxx* replaced with your api key.
|
| 46 |
+
|
| 47 |
The backend is loaded only when chosen. The unused one does not have to be installed.
|
| 48 |
|
| 49 |
3) Optional, not recommended: sentence segmenter (aka sentence tokenizer)
|
|
|
|
| 74 |
|
| 75 |
```
|
| 76 |
usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}]
|
| 77 |
+
[--backend {faster-whisper,whisper_timestamped,openai-api}] [--vad] [--buffer_trimming {sentence,segment}] [--buffer_trimming_sec BUFFER_TRIMMING_SEC] [--start_at START_AT] [--offline] [--comp_unaware]
|
| 78 |
audio_path
|
| 79 |
|
| 80 |
positional arguments:
|
|
|
|
| 94 |
Source language code, e.g. en,de,cs, or 'auto' for language detection.
|
| 95 |
--task {transcribe,translate}
|
| 96 |
Transcribe or translate.
|
| 97 |
+
--backend {faster-whisper,whisper_timestamped,openai-api}
|
| 98 |
Load only this backend for Whisper processing.
|
| 99 |
--vad Use VAD = voice activity detection, with the default parameters.
|
| 100 |
--buffer_trimming {sentence,segment}
|