Wav2Vec2 XLS-R Adult/Child Speech Classifier

Wav2Vec2 XLS-R Adult/Child Speech Classifier is an audio classification model based on the XLS-R architecture. This model is a fine-tuned version of wav2vec2-xls-r-300m on a private adult/child speech classification dataset.

This model was trained using HuggingFace's PyTorch framework. All training was done on a Tesla P100, provided by Kaggle. Training metrics were logged via Tensorboard.

Model

Model #params Arch. Training/Validation data (text)
wav2vec2-xls-r-adult-child-cls 300M XLS-R Adult/Child Speech Classification Dataset

Evaluation Results

The model achieves the following results on evaluation:

Dataset Loss Accuracy F1
Adult/Child Speech Classification 0.1851 94.69% 0.9508

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
0.2906 1.0 383 0.1856 0.9372 0.9421
0.1749 2.0 766 0.1925 0.9418 0.9465
0.1681 3.0 1149 0.1893 0.9414 0.9459
0.1295 4.0 1532 0.1851 0.9469 0.9508
0.2031 5.0 1915 0.1944 0.9423 0.9460

Disclaimer

Do consider the biases which came from pre-training datasets that may be carried over into the results of this model.

Authors

Wav2Vec2 XLS-R Adult/Child Speech Classifier was trained and evaluated by Wilson Wongso. All computation and development are done on Kaggle.

Framework versions

  • Transformers 4.17.0.dev0
  • Pytorch 1.10.2+cu102
  • Datasets 1.18.3
  • Tokenizers 0.11.0
Downloads last month
22
Safetensors
Model size
316M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.