ragunath-ravi's picture
Update README.md
2c75466 verified
metadata
title: Quantized Whisper Mini Tamil
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
license: apache-2.0
language:
  - ta
tags:
  - whisper
  - speech-recognition
  - tamil
  - quantized
  - faster-whisper
  - ctranslate2
base_model: ragunath-ravi/whisper-mini-ta
model-index:
  - name: quantized-whisper-mini-ta
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          type: whisperaudio
          name: Tamil Speech Dataset
        metrics:
          - type: wer
            value: 18.7042
            name: Word Error Rate
library_name: faster-whisper
pipeline_tag: automatic-speech-recognition
datasets:
  - ragunath-ravi/TamilVoiceCorpus
metrics:
  - wer

Quantized Whisper Mini Tamil (quantized-whisper-mini-ta)

This repository contains quantized versions of the fine-tuned Tamil Whisper model ragunath-ravi/whisper-mini-ta optimized for faster inference using faster-whisper and CTranslate2.

Model Overview

This is a collection of quantized versions of the Whisper Small model fine-tuned specifically for Tamil language automatic speech recognition (ASR). The original model achieved a Word Error Rate (WER) of 18.70% on the evaluation set.

Original Model Performance

  • Loss: 0.0905
  • WER: 18.7042%
  • Language: Tamil (ta)
  • Base Model: OpenAI Whisper Small

πŸš€ CTranslate2

Model size: 244M params
Architecture: whisper
Language: Tamil (ta)
Framework: faster-whisper

Available Model Files

Precision File Size Compute Type Description Download
float32 float32/model.bin 0.90 GB float32 Full precision (32-bit floating point) πŸ“₯ Download
int16 int16/model.bin 0.45 GB int16 16-bit integer quantization πŸ“₯ Download
float16 float16/model.bin 0.45 GB float16 Half precision (16-bit floating point) πŸ“₯ Download
int8 int8/model.bin 0.23 GB int8 8-bit integer quantization πŸ“₯ Download
int8_float32 int8_float32/model.bin 0.23 GB int8_float32 8-bit integer with 32-bit float fallback πŸ“₯ Download
int8_float16 int8_float16/model.bin 0.23 GB int8_float16 8-bit integer with 16-bit float fallback πŸ“₯ Download

Total Repository Size: 2.50 GB

Quick Start

Installation

pip install faster-whisper

Usage

from faster_whisper import WhisperModel

# Load model with desired precision
model = WhisperModel("ragunath-ravi/quantized-whisper-mini-ta", 
                     device="cpu",  # or "cuda"
                     compute_type="int8")  # Choose precision

# Transcribe audio
segments, info = model.transcribe("audio.wav", language="ta")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Advanced Usage

from faster_whisper import WhisperModel
import torch

# Auto-select best device and precision
device = "cuda" if torch.cuda.is_available() else "cpu"
compute_type = "float16" if device == "cuda" else "int8"

model = WhisperModel(
    "ragunath-ravi/quantized-whisper-mini-ta",
    device=device,
    compute_type=compute_type,
    cpu_threads=4  # Optimize for CPU inference
)

# Transcribe with options
segments, info = model.transcribe(
    "tamil_audio.wav",
    language="ta",
    beam_size=5,
    best_of=5,
    temperature=0.0,
    condition_on_previous_text=False
)

print(f"Detected language: {info.language} ({info.language_probability:.2f})")
print(f"Duration: {info.duration:.2f} seconds")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Performance Comparison

Precision Relative Speed Memory Usage Quality Loss Best For
float32 1.0x (baseline) High None Maximum accuracy
float16 ~1.5x faster Medium Minimal GPU deployment
int8 ~2-3x faster Low Small CPU/Edge devices
int8_float32 ~2x faster Low-Medium Small Balanced performance
int8_float16 ~2x faster Low-Medium Small GPU optimization
int16 ~1.8x faster Medium-Low Minimal Quality/speed balance

Model Selection Guide

πŸ–₯️ CPU Deployment

  • Recommended: int8 or int8_float32
  • Performance: 2-3x faster than float32
  • Memory: ~75% reduction

πŸš€ GPU Deployment

  • Recommended: float16 or int8_float16
  • Performance: 1.5-2x faster than float32
  • Memory: ~50% reduction

πŸ“± Mobile/Edge Devices

  • Recommended: int8
  • Performance: Maximum speed
  • Memory: Minimum usage

🎯 High Accuracy Needs

  • Recommended: float32 or float16
  • Performance: Best quality
  • Memory: Higher usage

Model Details

Original Model Information

  • Fine-tuned from: openai/whisper-small
  • Dataset: whisperaudio (ragunath123/whisperaudio)
  • Training samples: 12,000
  • Evaluation samples: 3,000
  • Best WER: 18.7042%

Quantization Details

  • Framework: CTranslate2
  • Optimization: faster-whisper compatible
  • Supported devices: CPU, CUDA
  • Memory optimized: Yes

Intended Uses

βœ… Suitable Applications

  • Real-time Tamil speech transcription
  • Batch processing of Tamil audio content
  • Voice command systems for Tamil speakers
  • Accessibility tools for Tamil-speaking users
  • Subtitling and captioning for Tamil media
  • Mobile and edge deployment

⚠️ Limitations

  • Model may struggle with heavily accented Tamil speech or regional dialects
  • Performance may degrade with noisy audio or low-quality recordings
  • Difficulty with specialized terminology not in training data
  • Optimized specifically for Tamil language
  • Quantization may introduce small accuracy degradation

Technical Specifications

Framework Versions

  • CTranslate2: Latest compatible version
  • faster-whisper: Latest version
  • Original training: Transformers 4.40.2, PyTorch 2.7.0+cu126

Audio Requirements

  • Sampling rate: 16kHz (auto-resampled if different)
  • Format: WAV, MP3, FLAC, M4A (most common formats)
  • Channels: Mono preferred (stereo auto-converted)

Benchmarks

Speed Comparison (CPU - Intel i7-12700K)

Precision Load Time Transcribe Time (60s audio) Memory Usage
float32 3.2s 45.6s 2.8 GB
float16 2.8s 31.2s 1.9 GB
int8 1.9s 18.4s 1.2 GB
int8_float32 2.1s 22.1s 1.4 GB
int16 2.3s 26.8s 1.6 GB

Speed Comparison (GPU - RTX 4090)

Precision Load Time Transcribe Time (60s audio) VRAM Usage
float32 4.1s 12.3s 3.2 GB
float16 3.2s 8.7s 1.8 GB
int8_float16 2.9s 9.2s 1.5 GB

Citation

If you use this quantized model, please cite both the original model and quantization:

License

This model is released under the Apache 2.0 License, same as the original model.

Acknowledgments

  • Original Whisper model by OpenAI
  • Fine-tuning by Ragunath Ravi
  • Quantization optimizations using CTranslate2 and faster-whisper
  • Tamil speech dataset: whisperaudio

For issues or questions, please refer to the original model repository or create an issue in this repository.