balacoon/mhubert-147 · Question Regarding Processing 48kHz Audio with mhubert-147

Thank you for your excellent work on encapsulating the code for extracting discrete audio unit sequences with mhubert-147. I have been reviewing the sample code you provided, and I have a question regarding the required audio format for processing.

Is it necessary to use audio that is in 16kHz and int16 format for processing? My original audio is at 48kHz, and I’ve noticed that librosa reads audio as float32 with normalization by default, while the soundfile library does not support resampling. Moreover, I observed that when I resample the 48kHz audio to 16kHz using librosa and pydub separately, the resulting discrete audio unit sequences differ.

Could you please advise on the correct approach to handle 48kHz audio? Any recommendations or best practices for converting or resampling the audio to ensure consistency with the expected output would be greatly appreciated.

Thank you for your time and assistance.