ASR-STT 8BIT Quantized

This is an 8-bit quantized version of Jacaranda-Health/ASR-STT.

Model Details

  • Base Model: Jacaranda-Health/ASR-STT
  • Quantization: 8bit
  • Size Reduction: 73.1% smaller than original
  • Original Size: 2913.89 MB
  • Quantized Size: 784.94 MB

Usage

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, BitsAndBytesConfig
import torch
import librosa

# Load processor
processor = AutoProcessor.from_pretrained("Jacaranda-Health/ASR-STT-8bit")

# Configure quantization
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=False
    
)

# Load quantized model
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    "eolang/ASR-STT-8bit",
    quantization_config=quantization_config,
    device_map="auto"
)

# Transcription function
def transcribe(filepath):
    audio, sr = librosa.load(filepath, sr=16000)
    inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
    
    # Convert to half precision for quantized models
    if torch.cuda.is_available():
        inputs = {k: v.cuda().half() for k, v in inputs.items()}
    else:
        inputs = {k: v.half() for k, v in inputs.items()}
    
    with torch.no_grad():
        generated_ids = model.generate(inputs["input_features"])
    
    return processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

# Example usage
transcription = transcribe("path/to/audio.wav")
print(transcription)

Performance

  • Faster inference due to reduced precision
  • Lower memory usage
  • Maintained transcription quality

Requirements

  • transformers
  • torch
  • bitsandbytes
  • librosa
Downloads last month
2
Safetensors
Model size
764M params
Tensor type
F32
F16
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Jacaranda-Health/ASR-STT-8bit

Quantized
(2)
this model