Update README
Browse files- README.md +8 -2
- docs/options.md +20 -12
    	
        README.md
    CHANGED
    
    | @@ -76,6 +76,12 @@ cores (up to 8): | |
| 76 | 
             
            python app.py --input_audio_max_duration -1 --auto_parallel True
         | 
| 77 | 
             
            ```
         | 
| 78 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 79 | 
             
            # Docker
         | 
| 80 |  | 
| 81 | 
             
            To run it in Docker, first install Docker and optionally the NVIDIA Container Toolkit in order to use the GPU. 
         | 
| @@ -109,7 +115,7 @@ You can also pass custom arguments to `app.py` in the Docker container, for inst | |
| 109 | 
             
            sudo docker run -d --gpus all -p 7860:7860 \
         | 
| 110 | 
             
            --mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
         | 
| 111 | 
             
            --restart=on-failure:15 registry.gitlab.com/aadnk/whisper-webui:latest \
         | 
| 112 | 
            -
            app.py --input_audio_max_duration -1 --server_name 0.0.0.0 -- | 
| 113 | 
             
            --default_vad silero-vad --default_model_name large
         | 
| 114 | 
             
            ```
         | 
| 115 |  | 
| @@ -119,7 +125,7 @@ sudo docker run --gpus all \ | |
| 119 | 
             
            --mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
         | 
| 120 | 
             
            --mount type=bind,source=${PWD},target=/app/data \
         | 
| 121 | 
             
            registry.gitlab.com/aadnk/whisper-webui:latest \
         | 
| 122 | 
            -
            cli.py --model large -- | 
| 123 | 
             
            --output_dir /app/data /app/data/YOUR-FILE-HERE.mp4
         | 
| 124 | 
             
            ```
         | 
| 125 |  | 
|  | |
| 76 | 
             
            python app.py --input_audio_max_duration -1 --auto_parallel True
         | 
| 77 | 
             
            ```
         | 
| 78 |  | 
| 79 | 
            +
            ### Multiple Files
         | 
| 80 | 
            +
             | 
| 81 | 
            +
            You can upload multiple files either through the "Upload files" option, or as a playlist on YouTube. 
         | 
| 82 | 
            +
            Each audio file will then be processed in turn, and the resulting SRT/VTT/Transcript will be made available in the "Download" section. 
         | 
| 83 | 
            +
            When more than one file is processed, the UI will also generate a "All_Output" zip file containing all the text output files.
         | 
| 84 | 
            +
             | 
| 85 | 
             
            # Docker
         | 
| 86 |  | 
| 87 | 
             
            To run it in Docker, first install Docker and optionally the NVIDIA Container Toolkit in order to use the GPU. 
         | 
|  | |
| 115 | 
             
            sudo docker run -d --gpus all -p 7860:7860 \
         | 
| 116 | 
             
            --mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
         | 
| 117 | 
             
            --restart=on-failure:15 registry.gitlab.com/aadnk/whisper-webui:latest \
         | 
| 118 | 
            +
            app.py --input_audio_max_duration -1 --server_name 0.0.0.0 --auto_parallel True \
         | 
| 119 | 
             
            --default_vad silero-vad --default_model_name large
         | 
| 120 | 
             
            ```
         | 
| 121 |  | 
|  | |
| 125 | 
             
            --mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
         | 
| 126 | 
             
            --mount type=bind,source=${PWD},target=/app/data \
         | 
| 127 | 
             
            registry.gitlab.com/aadnk/whisper-webui:latest \
         | 
| 128 | 
            +
            cli.py --model large --auto_parallel True --vad silero-vad \
         | 
| 129 | 
             
            --output_dir /app/data /app/data/YOUR-FILE-HERE.mp4
         | 
| 130 | 
             
            ```
         | 
| 131 |  | 
    	
        docs/options.md
    CHANGED
    
    | @@ -3,18 +3,19 @@ To transcribe or translate an audio file, you can either copy an URL from a webs | |
| 3 | 
             
            supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)" 
         | 
| 4 | 
             
            in the file selector to select any file type, including video files) or use the microphone.
         | 
| 5 |  | 
| 6 | 
            -
            For longer audio files (>10 minutes), it is recommended that you select Silero VAD (Voice Activity Detector) in the VAD option.
         | 
| 7 |  | 
| 8 | 
             
            ## Model
         | 
| 9 | 
             
            Select the model that Whisper will use to transcribe the audio:
         | 
| 10 |  | 
| 11 | 
            -
            | Size | 
| 12 | 
            -
             | 
| 13 | 
            -
            | tiny | 
| 14 | 
            -
            | base | 
| 15 | 
            -
            | small | 
| 16 | 
            -
            | medium | 
| 17 | 
            -
            | large | 
|  | |
| 18 |  | 
| 19 | 
             
            ## Language
         | 
| 20 |  | 
| @@ -24,10 +25,12 @@ Note that if the selected language and the language in the audio differs, Whispe | |
| 24 | 
             
            language. For instance, if the audio is in English but you select Japaneese, the model may translate the audio to Japanese.
         | 
| 25 |  | 
| 26 | 
             
            ## Inputs
         | 
| 27 | 
            -
            The options "URL (YouTube, etc.)", "Upload  | 
| 28 |  | 
| 29 | 
            -
             | 
| 30 | 
            -
            the URL. 
         | 
|  | |
|  | |
| 31 |  | 
| 32 | 
             
            ## Task
         | 
| 33 | 
             
            Select the task - either "transcribe" to transcribe the audio to text, or "translate" to translate it to English.
         | 
| @@ -75,4 +78,9 @@ number of seconds after the line has finished. For instance, if a line ends at 1 | |
| 75 | 
             
            10:04, the line's text will be included if the prompt window is 4 seconds or more (10:04 - 10:00 = 4 seconds).
         | 
| 76 |  | 
| 77 | 
             
            Note that detected lines in gaps between speech sections will not be included in the prompt 
         | 
| 78 | 
            -
            (if silero-vad or silero-vad-expand-into-gaps) is used.
         | 
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 3 | 
             
            supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)" 
         | 
| 4 | 
             
            in the file selector to select any file type, including video files) or use the microphone.
         | 
| 5 |  | 
| 6 | 
            +
            For longer audio files (>10 minutes), it is recommended that you select Silero VAD (Voice Activity Detector) in the VAD option, especially if you are using the `large-v1` model. Note that `large-v2` is a lot more forgiving, but you may still want to use a VAD with a slightly higher "VAD - Max Merge Size (s)" (60 seconds or more).
         | 
| 7 |  | 
| 8 | 
             
            ## Model
         | 
| 9 | 
             
            Select the model that Whisper will use to transcribe the audio:
         | 
| 10 |  | 
| 11 | 
            +
            | Size      | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
         | 
| 12 | 
            +
            |-----------|------------|--------------------|--------------------|---------------|----------------|
         | 
| 13 | 
            +
            | tiny      | 39 M       | tiny.en            | tiny               | ~1 GB         | ~32x           |
         | 
| 14 | 
            +
            | base      | 74 M       | base.en            | base               | ~1 GB         | ~16x           |
         | 
| 15 | 
            +
            | small     | 244 M      | small.en           | small              | ~2 GB         | ~6x            |
         | 
| 16 | 
            +
            | medium    | 769 M      | medium.en          | medium             | ~5 GB         | ~2x            |
         | 
| 17 | 
            +
            | large     | 1550 M     | N/A                | large              | ~10 GB        | 1x             |
         | 
| 18 | 
            +
            | large-v2  | 1550 M     | N/A                | large              | ~10 GB        | 1x             |
         | 
| 19 |  | 
| 20 | 
             
            ## Language
         | 
| 21 |  | 
|  | |
| 25 | 
             
            language. For instance, if the audio is in English but you select Japaneese, the model may translate the audio to Japanese.
         | 
| 26 |  | 
| 27 | 
             
            ## Inputs
         | 
| 28 | 
            +
            The options "URL (YouTube, etc.)", "Upload Files" or "Micriphone Input" allows you to send an audio input to the model.
         | 
| 29 |  | 
| 30 | 
            +
            ### Multiple Files
         | 
| 31 | 
            +
            Note that the UI will only process either the given URL or the upload files (including microphone) - not both. 
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            But you can upload multiple files either through the "Upload files" option, or as a playlist on YouTube. Each audio file will then be processed in turn, and the resulting SRT/VTT/Transcript will be made available in the "Download" section. When more than one file is processed, the UI will also generate a "All_Output" zip file containing all the text output files.
         | 
| 34 |  | 
| 35 | 
             
            ## Task
         | 
| 36 | 
             
            Select the task - either "transcribe" to transcribe the audio to text, or "translate" to translate it to English.
         | 
|  | |
| 78 | 
             
            10:04, the line's text will be included if the prompt window is 4 seconds or more (10:04 - 10:00 = 4 seconds).
         | 
| 79 |  | 
| 80 | 
             
            Note that detected lines in gaps between speech sections will not be included in the prompt 
         | 
| 81 | 
            +
            (if silero-vad or silero-vad-expand-into-gaps) is used.
         | 
| 82 | 
            +
             | 
| 83 | 
            +
            # Command Line Options
         | 
| 84 | 
            +
             | 
| 85 | 
            +
            Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
         | 
| 86 | 
            +
            CPU/GPU cores, the default model name/VAD and so on. Consult the README in the root folder for more information.
         | 
