eolang commited on
Commit
57fedf2
·
verified ·
1 Parent(s): 7bc6dba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -2
README.md CHANGED
@@ -1,8 +1,86 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
4
  ---
5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  # Model Card for Model ID
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: Jacaranda-Health/ASR-STT
4
+ tags:
5
+ - speech-to-text
6
+ - automatic-speech-recognition
7
+ - quantized
8
+ - 4bit
9
+ language:
10
+ - en
11
+ - sw
12
+ pipeline_tag: automatic-speech-recognition
13
  ---
14
 
15
+ # ASR-STT 4BIT Quantized
16
+
17
+ This is a 4bit quantized version of [Jacaranda-Health/ASR-STT](https://huggingface.co/Jacaranda-Health/ASR-STT).
18
+
19
+ ## Model Details
20
+ - **Base Model**: Jacaranda-Health/ASR-STT
21
+ - **Quantization**: 4bit
22
+ - **Size Reduction**: 84.6% smaller than original
23
+ - **Original Size**: 2913.89 MB
24
+ - **Quantized Size**: 448.94 MB
25
+
26
+ ## Usage
27
+
28
+ ```python
29
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, BitsAndBytesConfig
30
+ import torch
31
+ import librosa
32
+
33
+ # Load processor
34
+ processor = AutoProcessor.from_pretrained("eolang/ASR-STT-4bit")
35
+
36
+ # Configure quantization
37
+ quantization_config = BitsAndBytesConfig(
38
+ load_in_4bit=True,
39
+ bnb_4bit_compute_dtype=torch.float16,
40
+ bnb_4bit_quant_type="nf4",
41
+ bnb_4bit_use_double_quant=True
42
+ )
43
+
44
+ # Load quantized model
45
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
46
+ "eolang/ASR-STT-4bit",
47
+ quantization_config=quantization_config,
48
+ device_map="auto"
49
+ )
50
+
51
+ # Transcription function
52
+ def transcribe(filepath):
53
+ audio, sr = librosa.load(filepath, sr=16000)
54
+ inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
55
+
56
+ # Convert to half precision for quantized models
57
+ if torch.cuda.is_available():
58
+ inputs = {k: v.cuda().half() for k, v in inputs.items()}
59
+ else:
60
+ inputs = {k: v.half() for k, v in inputs.items()}
61
+
62
+ with torch.no_grad():
63
+ generated_ids = model.generate(inputs["input_features"])
64
+
65
+ return processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
66
+
67
+ # Example usage
68
+ transcription = transcribe("path/to/audio.wav")
69
+ print(transcription)
70
+ ```
71
+
72
+ ## Performance
73
+ - Faster inference due to reduced precision
74
+ - Lower memory usage
75
+ - Maintained transcription quality
76
+
77
+ ## Requirements
78
+ - transformers
79
+ - torch
80
+ - bitsandbytes
81
+ - librosa
82
+
83
+
84
  # Model Card for Model ID
85
 
86
  <!-- Provide a quick summary of what the model is/does. -->