qiangchunyu
/

SecoustiCodec

speech-processing

Model card Files Files and versions Community

qiangchunyu commited on 8 days ago

Commit

10ab439

·

verified ·

1 Parent(s): ea1cfe3

Create README.md

Files changed (1) hide show

README.md +51 -0

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+language: en
+tags:
+- audio
+- speech-processing
+- speech-codec
+- low-bitrate
+- streaming
+- tts
+- cross-modal
+license: apache-2.0
+---
+# SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec
+## Resources
+- [📄 Research Paper](https://arxiv.org/abs/2508.02849)
+- [💻 Source Code](https://github.com/QiangChunyu/SecoustiCodec)
+- [🤗 Demo Page](https://qiangchunyu.github.io/SecoustiCodec_Page/)
+## Model Overview
+SecoustiCodec is a state-of-the-art **low-bitrate streaming speech codec** that achieves good performance in speech reconstruction at ultra-low bitrates (0.27-1 kbps). The model introduces several innovations:
+- 🧠 **Cross-modal alignment**: Aligns text and speech in joint multimodal frame-level space
+- 🔍 **Semantic-paralinguistic disentanglement**: Separates linguistic content from speaker characteristics
+- ⚡ **Streaming support**: Real-time processing capabilities
+- 📊 **Efficient quantization**: VAE+FSQ approach solves token distribution problems
+- 🎯 **Acoustic-constrained optimization**: Ensures stable convergence
+## Architecture Overview
+![Model Architecture](https://qiangchunyu.github.io/SecoustiCodec_Page/model.png)
+## Acknowledgments
+- We used [HiFiGAN](https://github.com/jik876/hifi-gan) for efficient waveform generation
+- We referred to [MIMICodec](https://huggingface.co/kyutai/mimi) to implement this.
+## Citation
+```bibtex
+@article{qiang2025secousticodec,
+  title={SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec},
+  author={Qiang, Chunyu and Wang, Haoyu and Gong, Cheng and Wang, Tianrui and Fu, Ruibo and Wang, Tao and Chen, Ruilong and Yi, Jiangyan and Wen, Zhengqi and Zhang, Chen and Wang, Longbiao and Dang, Jianwu and Tao, Jianhua},
+  journal={arXiv preprint arXiv:2508.02849},
+  year={2025}
+}
+```