r-f/wav2vec-english-speech-emotion-recognition

aryanfar2025

Apr 29

The predicted_label tensor([32]) is 32 ?!

18 days ago

I have that as well. I have converted the code with chatgpt to the following and it outputs from 0 to 6.

feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("r-f/wav2vec-english-speech-emotion-recognition")
model = Wav2Vec2ForSequenceClassification.from_pretrained("r-f/wav2vec-english-speech-emotion-recognition")

audio_path = "OAF_youth_angry.wav"
audio, rate = librosa.load(audio_path, sr=16000)
inputs = feature_extractor(audio, sampling_rate=rate, return_tensors="pt", padding=True)

with torch.no_grad():
outputs = model(inputs.input_values)
print(outputs)
logits = outputs.logits
print(logits)

markosmuche

18 days ago

But I don't know why they put the wrong model here. I don't think the model gives us the correct output. If it doesn't, I am planning to train it again.

r-f
/

wav2vec-english-speech-emotion-recognition

predicted label is 32