--- license: apache-2.0 base_model: Jacaranda-Health/ASR-STT tags: - speech-to-text - automatic-speech-recognition - quantized - 4bit language: - en - sw pipeline_tag: automatic-speech-recognition --- # ASR-STT 4BIT Quantized This is a 4-bit quantized version of [Jacaranda-Health/ASR-STT](https://huggingface.co/Jacaranda-Health/ASR-STT). ## Model Details - **Base Model**: Jacaranda-Health/ASR-STT - **Quantization**: 4bit - **Size Reduction**: 84.6% smaller than original - **Original Size**: 2913.89 MB - **Quantized Size**: 448.94 MB ## Usage ```python from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, BitsAndBytesConfig import torch import librosa # Load processor processor = AutoProcessor.from_pretrained("Jacaranda-Health/ASR-STT-4bit") # Configure quantization quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True ) # Load quantized model model = AutoModelForSpeechSeq2Seq.from_pretrained( "eolang/ASR-STT-4bit", quantization_config=quantization_config, device_map="auto" ) # Transcription function def transcribe(filepath): audio, sr = librosa.load(filepath, sr=16000) inputs = processor(audio, sampling_rate=sr, return_tensors="pt") # Convert to half precision for quantized models if torch.cuda.is_available(): inputs = {k: v.cuda().half() for k, v in inputs.items()} else: inputs = {k: v.half() for k, v in inputs.items()} with torch.no_grad(): generated_ids = model.generate(inputs["input_features"]) return processor.batch_decode(generated_ids, skip_special_tokens=True)[0] # Example usage transcription = transcribe("path/to/audio.wav") print(transcription) ``` ## Performance - Faster inference due to reduced precision - Lower memory usage - Maintained transcription quality ## Requirements - transformers - torch - bitsandbytes - librosa