File size: 3,181 Bytes
342564b e267027 b243e55 f782ed5 b243e55 79daa1b f782ed5 79daa1b b243e55 ffa287e b243e55 ffa287e b243e55 ffa287e b243e55 ffa287e b243e55 ffa287e b243e55 ffa287e b243e55 ffa287e b243e55 0786134 b243e55 ffa287e b243e55 ffa287e b243e55 ffa287e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
---
datasets:
- narad/ravdess
language:
- en
metrics:
- f1
- accuracy
- recall
- precision
pipeline_tag: audio-classification
---
# Emotion Recognition in English Using RAVDESS and Wav2Vec 2.0
<!-- Provide a quick summary of what the model is/does. -->
This model extracts emotions from audio recordings. It was trained on RAVDESS, a dataset containing English audio recordings. The model recognises six emotions: anger, disgust, fear, happiness, sadness and surprise.
The model recreates the work of this [Greek emotion extractor](https://huggingface.co/m3hrdadfi/wav2vec2-xlsr-greek-speech-emotion-recognition/blob/main/README.md) using a pre-trained [Wav2Vec2](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english) model to process the data.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Adapted from:** [Emotion Recognition in Greek](https://huggingface.co/m3hrdadfi/wav2vec2-xlsr-greek-speech-emotion-recognition/blob/main/README.md)
- **Model type:** NN with CTC
- **Language(s) (NLP):** English
- **Finetuned from model:** [wav2vec2](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english)
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
The RAVDESS dataset was split into training, validation and test sets with 60, 20 and 20 splits, respectively.
### Training Procedure
The fine-tuning process was centred on four hyper-parameters:
- the number of batches (4, 8),
- gradient accumulation steps (GAS) (2, 4, 6, 8),
- number of epochs (10, 20) and
- the learning rate (1e-3, 1e-4, 1e-5).
Each experiment was repeated 10 times.
## Evaluation
The set of hyper-parameters resulting in the best performance is: 4 batches, 4 GAS, 10 epochs and 1e-4 learning rate
## Testing
The model was retrained on the combined train and validation sets using the best hyper-parameter set. The performance on the test set has an average Accuracy and F1 scores of 84.84% (SD 2 and 2.08, respectively)
## Results
We retained the model providing the highest performance over the 10 runs.
| Emotion | Accuracy | Precision | Recall | F1 |
|-----------|:-------:|-----------:|---------:|---------:|
| Anger | | 96.55 | 87.50 | |
| Disgust | | 90.91 | 93.75 | |
| Fear | | 96.30 | 81.25 | |
| Happiness | | 93.10 | 84.38 | |
| Sad | | 81.58 | 96.88 | |
| Surprise | | 77.78 | 87.50 | |
| Total | 88.54 | 89.37 | 88.54 | 88.62 |
<!-- ## Citation [optional] -->
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
<!-- **BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed] --> |