Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

Fine-tuned facebook/wav2vec2-large-xlsr-53 on English and Chinese data from adult speakers. The model is trained on the training sets of CREMA-D, ESD, IEMOCAP, and TESS. When using this model, make sure that your speech input is sampled at 16kHz.

The scripts used for training and evaluation can be found here: https://github.com/HLTCHKUST/elderly_ser/tree/main

Evaluation Results

For the details (e.g., the statistics of train, valid, and test data), please refer to our paper on arXiv. It also provides the model's speech emotion recognition performances on: English-All, Chinese-All, English-Elderly, Chinese-Elderly, English-Adults, Chinese-Adults.

Citation

Our paper will be published at INTERSPEECH 2023. In the meantime, you can find our paper on arXiv. If you find our work useful, please consider citing our paper as follows:

@misc{cahyawijaya2023crosslingual,
      title={Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition}, 
      author={Samuel Cahyawijaya and Holy Lovenia and Willy Chung and Rita Frieske and Zihan Liu and Pascale Fung},
      year={2023},
      eprint={2306.14517},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
5
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train CAiRE/SER-wav2vec2-large-xlsr-53-eng-zho-adults