johaness14 commited on
Commit
ba540d2
·
1 Parent(s): 39911bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md CHANGED
@@ -50,6 +50,91 @@ The following hyperparameters were used during training:
50
  - num_epochs: 85
51
  - mixed_precision_training: Native AMP
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss | Wer |
 
50
  - num_epochs: 85
51
  - mixed_precision_training: Native AMP
52
 
53
+ ### How to run (Gradio Web)
54
+ ```python
55
+ import torch
56
+ import torchaudio
57
+ import gradio as gr
58
+ import numpy as np
59
+ from transformers import pipeline, AutoProcessor, AutoModelForCTC
60
+
61
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
62
+
63
+ # Load the model and processor
64
+ MODEL_NAME = "<fill this to your model>"
65
+ processor = AutoProcessor.from_pretrained(MODEL_NAME)
66
+ model = AutoModelForCTC.from_pretrained(MODEL_NAME)
67
+
68
+ # Move model to GPU
69
+ model.to(device)
70
+
71
+ # Create the pipeline with the model and processor
72
+ transcriber = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=device)
73
+
74
+ def transcribe(audio):
75
+ sr, y = audio
76
+ y = y.astype(np.float32)
77
+ y /= np.max(np.abs(y))
78
+
79
+ return transcriber({"sampling_rate": sr, "raw": y})["text"]
80
+
81
+ demo = gr.Interface(
82
+ transcribe,
83
+ gr.Audio(sources=["upload"]),
84
+ "text",
85
+ )
86
+
87
+ demo.launch(share=True)
88
+ ```
89
+
90
+ ### How to run
91
+ ```python
92
+ import torch
93
+ import torchaudio
94
+ import gradio as gr
95
+ import numpy as np
96
+ from transformers import pipeline, AutoProcessor, AutoModelForCTC
97
+
98
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
99
+
100
+ # Load the model and processor
101
+ MODEL_NAME = "<fill this to actual model>"
102
+ processor = AutoProcessor.from_pretrained(MODEL_NAME)
103
+ model = AutoModelForCTC.from_pretrained(MODEL_NAME)
104
+
105
+ # Move model to GPU
106
+ model.to(device)
107
+
108
+ # Load audio file
109
+ AUDIO_PATH = "<replace 'path_to_audio_file.wav' with the actual path to your audio file>"
110
+ audio_input, sample_rate = torchaudio.load(AUDIO_PATH)
111
+
112
+ # Ensure the audio is mono (1 channel)
113
+ if audio_input.shape[0] > 1:
114
+ audio_input = torch.mean(audio_input, dim=0, keepdim=True)
115
+
116
+ # Resample audio if necessary
117
+ if sample_rate != 16000:
118
+ resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
119
+ audio_input = resampler(audio_input)
120
+
121
+ # Process the audio input
122
+ input_values = processor(audio_input.squeeze(), sampling_rate=16000, return_tensors="pt").input_values
123
+
124
+ # Move input values to GPU
125
+ input_values = input_values.to(device)
126
+
127
+ # Perform inference
128
+ with torch.no_grad():
129
+ logits = model(input_values).logits
130
+
131
+ # Decode the logits to text
132
+ predicted_ids = torch.argmax(logits, dim=-1)
133
+ transcription = processor.batch_decode(predicted_ids)[0]
134
+
135
+ print("Transcription:", transcription)
136
+ ```
137
+
138
  ### Training results
139
 
140
  | Training Loss | Epoch | Step | Validation Loss | Wer |