--- license: cc-by-4.0 --- ## Balacoon Discrete Vocoder This discrete vocoder consists of both analysis and synthesis components. * Analysis: Converts audio into audio tokens—four parallel codebooks, each containing 2,048 values. * Synthesis: Converts audio tokens back into audio. The vocoder operates with 24 kHz audio at a frame rate of 50. It is designed as a middle ground between the high bitrate of EnCodec and the lower bitrate alternatives like Mimi (12.5 frames per second) or WaveTokenizer (which uses a single codebook). ## How to Use the Vocoder: ```python import torch import soundfile as sf from huggingface_hub import hf_hub_download device = torch.device('cuda') # load the model encoder_path = hf_hub_download(repo_id="balacoon/vq4_50fps_24khz_vocoder", filename="analysis.jit") decoder_path = hf_hub_download(repo_id="balacoon/vq4_50fps_24khz_vocoder", filename="synthesis.jit") encoder = torch.jit.load(encoder_path) decoder = torch.jit.load(decoder_path) # read the audio orig_audio_npy, sr = sf.read(path, dtype="int16") assert sr == 24000 orig_audio = torch.tensor(orig_audio_npy).to(device).unsqueeze(0) # batch x samples # extract audio tokens from the audio tokens = encoder(orig_audio) # batch x frames x 4 # synthesize audio from audio tokens resynthesized_audio = decoder(tokens) # batch x samples ``` See performance of the codec on `vocoder` leaderboard: [TTSLeaderboard](https://huggingface.co/spaces/balacoon/TTSLeaderboard)