--- language: en license: apache-2.0 datasets: - ravdess libraries: - speechbrain tags: - emotion-classification - speech-emotion-recognition - speaker-characteristics - audio-classification - voice-analysis --- # Emotion Classification Model This model is a 7-class SVM classifier trained on the RAVDESS dataset using SpeechBrain ECAPA-TDNN embeddings as features. ## Model Details - Input: Audio file (will be converted to 16kHz, mono, single channel) - Output: Predicted emotion (7 classes) [angry, disgust, fearful, happy, neutral/calm, sad, surprised] - Features: - SpeechBrain ECAPA-TDNN embedding [192 features] - Performance: - RAVDESS 5-fold cross-validation: 86.24% accuracy ## Installation You can install the package directly from GitHub: ```bash pip install git+https://github.com/griko/voice-emotion-classification.git ``` ## Usage ```python from pipelines.emotion_classifier import EmotionClassificationPipeline # Load the model classifier = EmotionClassificationPipeline.from_pretrained("griko/emotion_7_cls_svm_ecapa_ravdess") # Use it for prediction result = classifier("path/to/audio.wav") print(result) # ['angry'] or ['disgust'] or ['fearful'] or ['happy'] or ['neutral/calm'] or ['sad'] or ['surprised'] # Batch prediction results = classifier(["audio1.wav", "audio2.wav"]) print(results) # ['angry', 'disgust'] ``` ## Input Requirements - Audio files should be in WAV format - Audio will be automatically resampled to 16kHz if needed - Audio will be converted to mono if needed ## Limitations - Model was trained on actor voices from RAVDESS dataset - Performance may vary on different audio qualities or recording conditions ## Citation If you use this model in your research, please cite: ```bibtex @misc{koushnir2025vanpyvoiceanalysisframework, title={VANPY: Voice Analysis Framework}, author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan}, year={2025}, eprint={2502.17579}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2502.17579}, } ```