|
--- |
|
license: apache-2.0 |
|
base_model: Jacaranda-Health/ASR-STT |
|
tags: |
|
- speech-to-text |
|
- automatic-speech-recognition |
|
- quantized |
|
- 4bit |
|
language: |
|
- en |
|
- sw |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
# ASR-STT 4BIT Quantized |
|
|
|
This is a 4-bit quantized version of [Jacaranda-Health/ASR-STT](https://huggingface.co/Jacaranda-Health/ASR-STT). |
|
|
|
## Model Details |
|
- **Base Model**: Jacaranda-Health/ASR-STT |
|
- **Quantization**: 4bit |
|
- **Size Reduction**: 84.6% smaller than original |
|
- **Original Size**: 2913.89 MB |
|
- **Quantized Size**: 448.94 MB |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, BitsAndBytesConfig |
|
import torch |
|
import librosa |
|
|
|
# Load processor |
|
processor = AutoProcessor.from_pretrained("Jacaranda-Health/ASR-STT-4bit") |
|
|
|
# Configure quantization |
|
quantization_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_compute_dtype=torch.float16, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_use_double_quant=True |
|
) |
|
|
|
# Load quantized model |
|
model = AutoModelForSpeechSeq2Seq.from_pretrained( |
|
"eolang/ASR-STT-4bit", |
|
quantization_config=quantization_config, |
|
device_map="auto" |
|
) |
|
|
|
# Transcription function |
|
def transcribe(filepath): |
|
audio, sr = librosa.load(filepath, sr=16000) |
|
inputs = processor(audio, sampling_rate=sr, return_tensors="pt") |
|
|
|
# Convert to half precision for quantized models |
|
if torch.cuda.is_available(): |
|
inputs = {k: v.cuda().half() for k, v in inputs.items()} |
|
else: |
|
inputs = {k: v.half() for k, v in inputs.items()} |
|
|
|
with torch.no_grad(): |
|
generated_ids = model.generate(inputs["input_features"]) |
|
|
|
return processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
|
# Example usage |
|
transcription = transcribe("path/to/audio.wav") |
|
print(transcription) |
|
``` |
|
|
|
## Performance |
|
- Faster inference due to reduced precision |
|
- Lower memory usage |
|
- Maintained transcription quality |
|
|
|
## Requirements |
|
- transformers |
|
- torch |
|
- bitsandbytes |
|
- librosa |