This is a streamlined interface version of WavTokenizer-large-speech-75token, providing a way to interact with the model through separate encoder and decoder components.

Reduced model size from 1.75GB to ~330MB by keeping only necessary components for inference
Split interface (82MB encoder, 248MB decoder)

The model is split into:

encoder/: Handles audio encoding
decoder/: Handles decoding and synthesis

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support