Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

Fine-tuned facebook/wav2vec2-large-xlsr-53 on English and Chinese data from adult speakers. The model is trained on the training sets of CREMA-D, ESD, IEMOCAP, and TESS. When using this model, make sure that your speech input is sampled at 16kHz.

The scripts used for training and evaluation can be found here: https://github.com/HLTCHKUST/elderly_ser/tree/main

Evaluation Results

For the details (e.g., the statistics of train, valid, and test data), please refer to our paper on arXiv. It also provides the model's speech emotion recognition performances on: English-All, Chinese-All, English-Elderly, Chinese-Elderly, English-Adults, Chinese-Adults.

Citation

Our paper will be published at INTERSPEECH 2023. In the meantime, you can find our paper on arXiv. If you find our work useful, please consider citing our paper as follows:

@misc{cahyawijaya2023crosslingual,
      title={Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition}, 
      author={Samuel Cahyawijaya and Holy Lovenia and Willy Chung and Rita Frieske and Zihan Liu and Pascale Fung},
      year={2023},
      eprint={2306.14517},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train CAiRE/SER-wav2vec2-large-xlsr-53-eng-zho-adults