r-f
/

wav2vec-english-speech-emotion-recognition

Automatic Speech Recognition

Generated from Trainer

Model card Files Files and versions Community

r-f commited on Jan 2

Commit

2c59b3f

·

verified ·

1 Parent(s): e7ae5db

Update README.md

Files changed (1) hide show

README.md +30 -6

README.md CHANGED Viewed

@@ -22,12 +22,36 @@ emotions = ['angry' 'disgust' 'fear' 'happy' 'neutral' 'sad' 'surprise']
 It achieves the following results on the evaluation set:
 - Loss: 0.104075
 - Accuracy: 0.97463
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:

 It achieves the following results on the evaluation set:
 - Loss: 0.104075
 - Accuracy: 0.97463
+## Model Usage
+```bash
+pip install transformers librosa torch
+```
+```python
+from transformers import *
+import librosa
+import torch
+feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("r-f/wav2vec-english-speech-emotion-recognition")
+model = Wav2Vec2ForCTC.from_pretrained("r-f/wav2vec-english-speech-emotion-recognition")
+def predict_emotion(audio_path):
+    audio, rate = librosa.load(audio_path, sr=16000)
+    inputs = feature_extractor(audio, sampling_rate=rate, return_tensors="pt", padding=True)
+    with torch.no_grad():
+        outputs = model(inputs.input_values)
+        predictions = torch.nn.functional.softmax(outputs.logits.mean(dim=1), dim=-1)  # Average over sequence length
+        predicted_label = torch.argmax(predictions, dim=-1)
+        emotion = model.config.id2label[predicted_label.item()]
+    return emotion
+emotion = predict_emotion("example_audio.wav")
+print(f"Predicted emotion: {emotion}")
+>> Predicted emotion: angry
+```
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training: