faster-whisper-base-ar-quran

This model is a CTranslate2 version of tarteel-ai/whisper-base-ar-quran. The conversion was performed using the following command:

# Don't add vocab.json and config.json , as they already get created while converting to new model
ct2-transformers-converter --model tarteel-ai/whisper-base-ar-quran --force --output_dir "path/to/output/dir/faster-whisper-base-ar-quran"  --quantization float16 --copy_files added_tokens.json normalizer.json preprocessor_config.json special_tokens_map.json tokenizer_config.json

To use the ct2-transformers-converter command, you'll need to install the required dependencies:

pip install transformers[torch]

For more information about the converter, see the CTranslate2 documentation or the "CTranslate2 Installation" section below

This conversion transforms the model from OpenAI's vanilla Whisper family to the faster-whisper family, making it compatible with WhisperX which utilizes faster-whisper versions for improved performance.

For reference: "WhisperX enables significantly faster transcription speeds - up to 70x realtime with large-v2 models while requiring less than 8GB GPU memory."

Usage Example

Follow the Python Usage 🐍 section in WhisperX's README page here, but change this line:

model = whisperx.load_model("large-v2", device, compute_type=compute_type)

to this line:

model = whisperx.load_model("OdyAsh/faster-whisper-base-ar-quran", device, compute_type=compute_type)

Another usage example: specific parts of a repo (called surah-splitter) here and here.

Quantization Side Note

You'll see that in the command above, we're using --quantization float16. However, the original tarteel-ai/whisper-base-ar-quran is in float32 precision (source). Yet, float16 conversion was kept for the following reasons:

Reduced size: 141mb instead of 290mb.
Negligible performance impact for the usecase in mind: When this model was tested on sample Quran audio files, there were few transcription errors that didn't affect the overall performance of OdyAsh's whisperx solution which used this model, since the subsequent steps in that solution's pipeline (e.g., alignment, reference <-> input matching DP algorithm, etc.) still yielded accurate results.

However, if the transcription results are not satisfactory for your use case, you can always convert the model to float32 precision by:

Changing the --quantization argument to float32 in the command above to get a larger model size (around 290mb).
Or, during inference runtime, you can set the compute_type argument of whisperx.load_model() to float32.
- Read this and this for proof that the model can be recomputed to float32 precision at runtime.

CTranslate2 Installation

In the OdyAsh/faster-whisper-base-ar-quran GitHub repo, you'll see pyproject.toml and uv.lock files, indicating that you can use uv to install the required packages for the ct2-transformers-converter command instead of pip (if you want).

Steps:

Install uv if not already installed by following this section in their docs.
In your terminal, navigate to your local directory in which you cloned the OdyAsh/faster-whisper-base-ar-quran GitHub repo.
Install the required packages right away (since uv.lock file is already present in that local directory):

uv install

Verify installation:

ct2-transformers-converter --help

OdyAsh
/

faster-whisper-base-ar-quran

faster-whisper-base-ar-quran

Usage Example

Quantization Side Note

CTranslate2 Installation

Model tree for OdyAsh/faster-whisper-base-ar-quran

Evaluation results