griko commited on
Commit
23af702
·
verified ·
1 Parent(s): 5cf7168

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ datasets:
5
+ - ravdess
6
+ libraries:
7
+ - speechbrain
8
+ tags:
9
+ - emotion-classification
10
+ - speech-emotion-recognition
11
+ - speaker-characteristics
12
+ - audio-classification
13
+ - voice-analysis
14
+ ---
15
+
16
+ # Emotion Classification Model
17
+
18
+ This model is a 7-class SVM classifier trained on the RAVDESS dataset using SpeechBrain ECAPA-TDNN embeddings as features.
19
+
20
+ ## Model Details
21
+ - Input: Audio file (will be converted to 16kHz, mono, single channel)
22
+ - Output: Predicted emotion (7 classes) [angry, disgust, fearful, happy, neutral/calm, sad, surprised]
23
+ - Features:
24
+ - SpeechBrain ECAPA-TDNN embedding [192 features]
25
+ - Performance:
26
+ - RAVDESS 5-fold cross-validation: 86.24% accuracy
27
+
28
+ ## Installation
29
+
30
+ You can install the package directly from GitHub:
31
+
32
+ ```bash
33
+ pip install git+https://github.com/griko/voice-emotion-classification.git
34
+ ```
35
+
36
+ ## Usage
37
+
38
+ ```python
39
+ from pipelines.emotion_classifier import EmotionClassificationPipeline
40
+
41
+ # Load the model
42
+ classifier = EmotionClassificationPipeline.from_pretrained("griko/emotion_7_cls_svm_ecapa_ravdess")
43
+
44
+ # Use it for prediction
45
+ result = classifier("path/to/audio.wav")
46
+ print(result) # ['angry'] or ['disgust'] or ['fearful'] or ['happy'] or ['neutral/calm'] or ['sad'] or ['surprised']
47
+
48
+ # Batch prediction
49
+ results = classifier(["audio1.wav", "audio2.wav"])
50
+ print(results) # ['angry', 'disgust']
51
+ ```
52
+
53
+ ## Input Requirements
54
+
55
+ - Audio files should be in WAV format
56
+ - Audio will be automatically resampled to 16kHz if needed
57
+ - Audio will be converted to mono if needed
58
+
59
+ ## Limitations
60
+ - Model was trained on actor voices from RAVDESS dataset
61
+ - Performance may vary on different audio qualities or recording conditions
62
+
63
+ ## Citation
64
+ If you use this model in your research, please cite:
65
+ ```bibtex
66
+ @misc{koushnir2025vanpyvoiceanalysisframework,
67
+ title={VANPY: Voice Analysis Framework},
68
+ author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
69
+ year={2025},
70
+ eprint={2502.17579},
71
+ archivePrefix={arXiv},
72
+ primaryClass={cs.SD},
73
+ url={https://arxiv.org/abs/2502.17579},
74
+ }
75
+ ```
config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"labels": ["angry", "disgust", "fearful", "happy", "neutral/calm", "sad", "surprised"], "feature_names": ["0_speechbrain_embedding", "1_speechbrain_embedding", "2_speechbrain_embedding", "3_speechbrain_embedding", "4_speechbrain_embedding", "5_speechbrain_embedding", "6_speechbrain_embedding", "7_speechbrain_embedding", "8_speechbrain_embedding", "9_speechbrain_embedding", "10_speechbrain_embedding", "11_speechbrain_embedding", "12_speechbrain_embedding", "13_speechbrain_embedding", "14_speechbrain_embedding", "15_speechbrain_embedding", "16_speechbrain_embedding", "17_speechbrain_embedding", "18_speechbrain_embedding", "19_speechbrain_embedding", "20_speechbrain_embedding", "21_speechbrain_embedding", "22_speechbrain_embedding", "23_speechbrain_embedding", "24_speechbrain_embedding", "25_speechbrain_embedding", "26_speechbrain_embedding", "27_speechbrain_embedding", "28_speechbrain_embedding", "29_speechbrain_embedding", "30_speechbrain_embedding", "31_speechbrain_embedding", "32_speechbrain_embedding", "33_speechbrain_embedding", "34_speechbrain_embedding", "35_speechbrain_embedding", "36_speechbrain_embedding", "37_speechbrain_embedding", "38_speechbrain_embedding", "39_speechbrain_embedding", "40_speechbrain_embedding", "41_speechbrain_embedding", "42_speechbrain_embedding", "43_speechbrain_embedding", "44_speechbrain_embedding", "45_speechbrain_embedding", "46_speechbrain_embedding", "47_speechbrain_embedding", "48_speechbrain_embedding", "49_speechbrain_embedding", "50_speechbrain_embedding", "51_speechbrain_embedding", "52_speechbrain_embedding", "53_speechbrain_embedding", "54_speechbrain_embedding", "55_speechbrain_embedding", "56_speechbrain_embedding", "57_speechbrain_embedding", "58_speechbrain_embedding", "59_speechbrain_embedding", "60_speechbrain_embedding", "61_speechbrain_embedding", "62_speechbrain_embedding", "63_speechbrain_embedding", "64_speechbrain_embedding", "65_speechbrain_embedding", "66_speechbrain_embedding", "67_speechbrain_embedding", "68_speechbrain_embedding", "69_speechbrain_embedding", "70_speechbrain_embedding", "71_speechbrain_embedding", "72_speechbrain_embedding", "73_speechbrain_embedding", "74_speechbrain_embedding", "75_speechbrain_embedding", "76_speechbrain_embedding", "77_speechbrain_embedding", "78_speechbrain_embedding", "79_speechbrain_embedding", "80_speechbrain_embedding", "81_speechbrain_embedding", "82_speechbrain_embedding", "83_speechbrain_embedding", "84_speechbrain_embedding", "85_speechbrain_embedding", "86_speechbrain_embedding", "87_speechbrain_embedding", "88_speechbrain_embedding", "89_speechbrain_embedding", "90_speechbrain_embedding", "91_speechbrain_embedding", "92_speechbrain_embedding", "93_speechbrain_embedding", "94_speechbrain_embedding", "95_speechbrain_embedding", "96_speechbrain_embedding", "97_speechbrain_embedding", "98_speechbrain_embedding", "99_speechbrain_embedding", "100_speechbrain_embedding", "101_speechbrain_embedding", "102_speechbrain_embedding", "103_speechbrain_embedding", "104_speechbrain_embedding", "105_speechbrain_embedding", "106_speechbrain_embedding", "107_speechbrain_embedding", "108_speechbrain_embedding", "109_speechbrain_embedding", "110_speechbrain_embedding", "111_speechbrain_embedding", "112_speechbrain_embedding", "113_speechbrain_embedding", "114_speechbrain_embedding", "115_speechbrain_embedding", "116_speechbrain_embedding", "117_speechbrain_embedding", "118_speechbrain_embedding", "119_speechbrain_embedding", "120_speechbrain_embedding", "121_speechbrain_embedding", "122_speechbrain_embedding", "123_speechbrain_embedding", "124_speechbrain_embedding", "125_speechbrain_embedding", "126_speechbrain_embedding", "127_speechbrain_embedding", "128_speechbrain_embedding", "129_speechbrain_embedding", "130_speechbrain_embedding", "131_speechbrain_embedding", "132_speechbrain_embedding", "133_speechbrain_embedding", "134_speechbrain_embedding", "135_speechbrain_embedding", "136_speechbrain_embedding", "137_speechbrain_embedding", "138_speechbrain_embedding", "139_speechbrain_embedding", "140_speechbrain_embedding", "141_speechbrain_embedding", "142_speechbrain_embedding", "143_speechbrain_embedding", "144_speechbrain_embedding", "145_speechbrain_embedding", "146_speechbrain_embedding", "147_speechbrain_embedding", "148_speechbrain_embedding", "149_speechbrain_embedding", "150_speechbrain_embedding", "151_speechbrain_embedding", "152_speechbrain_embedding", "153_speechbrain_embedding", "154_speechbrain_embedding", "155_speechbrain_embedding", "156_speechbrain_embedding", "157_speechbrain_embedding", "158_speechbrain_embedding", "159_speechbrain_embedding", "160_speechbrain_embedding", "161_speechbrain_embedding", "162_speechbrain_embedding", "163_speechbrain_embedding", "164_speechbrain_embedding", "165_speechbrain_embedding", "166_speechbrain_embedding", "167_speechbrain_embedding", "168_speechbrain_embedding", "169_speechbrain_embedding", "170_speechbrain_embedding", "171_speechbrain_embedding", "172_speechbrain_embedding", "173_speechbrain_embedding", "174_speechbrain_embedding", "175_speechbrain_embedding", "176_speechbrain_embedding", "177_speechbrain_embedding", "178_speechbrain_embedding", "179_speechbrain_embedding", "180_speechbrain_embedding", "181_speechbrain_embedding", "182_speechbrain_embedding", "183_speechbrain_embedding", "184_speechbrain_embedding", "185_speechbrain_embedding", "186_speechbrain_embedding", "187_speechbrain_embedding", "188_speechbrain_embedding", "189_speechbrain_embedding", "190_speechbrain_embedding", "191_speechbrain_embedding"]}
ravdess_svm_speechbrain_ecapa_voxceleb_no_processor_cv.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f34966a5daa13abdb1a47e5be2538443a9281b0b89e763a14a3144700b2a86f2
3
+ size 1798472
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ torch>=2.0.0
2
+ torchaudio>=2.0.0
3
+ speechbrain>=0.5.15
4
+ scikit-learn>=1.0.0
5
+ pandas>=1.5.0
6
+ soundfile>=0.12.1
7
+ joblib>=1.1.0
svm_model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85c2f42d256e7be7223a7ed5a02106b9d5b79b72c79f4aa26c115fe58008d8fa
3
+ size 1853959