Help with labeling

#12
by AngryBear9019 - opened

First of all, I would like to Thank you for the work you put into this model and overall, for everything you've done. Again, big thank you.

I have a question about the labeling the gender. Is there a simple way to label silence audio with default something like male or female. Right now, silent audio is labeled as female audio, can I change that or even label them as none, for all silent audios?

Any help from anyone? Please...

Hello, thank you for your kind words and patience!

Regarding your question about labeling gender for silent audio, you're correct that the current model doesn't support distinguishing silent audio and gender. There are a couple of strategies you could consider to handle this:

  1. Fine-Tuning: One approach is to fine-tune the model by adding a new classification head. This head would output three classes: male, female, and silence. This method would involve retraining the model with a dataset that includes examples of silent audio labeled appropriately.

  2. Voice Activity Detection (VAD): Another effective approach is to integrate a Voice Activity Detection model to first identify and segregate silent segments. Tools like the VAD model from NeMo (https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Voice_Activity_Detection.ipynb) are excellent for this purpose. You can preprocess your audio data to flag silent sections and then either label them as 'none' or exclude them from gender labeling.

Hello, thank you for your kind words and patience!

Regarding your question about labeling gender for silent audio, you're correct that the current model doesn't support distinguishing silent audio and gender. There are a couple of strategies you could consider to handle this:

  1. Fine-Tuning: One approach is to fine-tune the model by adding a new classification head. This head would output three classes: male, female, and silence. This method would involve retraining the model with a dataset that includes examples of silent audio labeled appropriately.

  2. Voice Activity Detection (VAD): Another effective approach is to integrate a Voice Activity Detection model to first identify and segregate silent segments. Tools like the VAD model from NeMo (https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Voice_Activity_Detection.ipynb) are excellent for this purpose. You can preprocess your audio data to flag silent sections and then either label them as 'none' or exclude them from gender labeling.

Hello and thank you for your answer. I will probably take the VAD path as I don't know how I can train the model and how it is properly done.
I would like to thank you again for this great model and the work you put in.

alefiury changed discussion status to closed

Sign up or log in to comment