predicted label is 32
The predicted_label tensor([32]) is 32 ?!
I have that as well. I have converted the code with chatgpt to the following and it outputs from 0 to 6.
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("r-f/wav2vec-english-speech-emotion-recognition")
model = Wav2Vec2ForSequenceClassification.from_pretrained("r-f/wav2vec-english-speech-emotion-recognition")
audio_path = "OAF_youth_angry.wav"
audio, rate = librosa.load(audio_path, sr=16000)
inputs = feature_extractor(audio, sampling_rate=rate, return_tensors="pt", padding=True)
with torch.no_grad():
outputs = model(inputs.input_values)
print(outputs)
logits = outputs.logits
print(logits)
But I don't know why they put the wrong model here. I don't think the model gives us the correct output. If it doesn't, I am planning to train it again.