A different audio tokenizer?
Hello! I just tried the model. Seems good. Training works.
BUT: I would love to see a version with a model straightforward audio tokenizer like WavTokenizer or Mimi (for Kyutai like Sesame has) instead of SNAC.
Thanks!
Hello! I just tried the model. Seems good. Training works.
BUT: I would love to see a version with a model straightforward audio tokenizer like WavTokenizer or Mimi (for Kyutai like Sesame has) instead of SNAC.
Thanks!
I will release models trained with different codecs in a month. WavTokenizer is nice, but its quality isn't great. Mimi seems good. However, I prefer better and faster codec architectures.
Got it, thanks for your response! Would like to see codecs that have one flow of tokens, instead of three like in SNAC :)
Also, please release a 48 kHz version and possible to fine-tune the codec! Thanks!
Got it, thanks for your response! Would like to see codecs that have one flow of tokens, instead of three like in SNAC :)
I'm developing the CodecHub library to measure the performance of different Audio Codec models.
https://github.com/Vyvo-Labs/CodecHub
For example, this codec is good, but I don't know its quality. That's why I want to test many codec models and choose a faster one.
https://github.com/zhai-lw/SQCodec
Also, please release a 48 kHz version and possible to fine-tune the codec! Thanks!
I haven't trained a model related to 48kHz. Gathering a suitable dataset for this might be difficult. The Emilia dataset is in 24kHz format. I could upscale it, but for 150k hours of data, this would take too long and be very costly.
Nah, itβs fine. Even if your dataset is 24 kHz, just make sure the codec itself is 48 kHz, so people can fine-tune up to 48 kHz :)
Solved