open-universe: Generative Speech Enhancement with Score-based Diffusion and Adversarial Training
This repository contains the configurations and weights for the UNIVERSE++ and UNIVERSE models implemented in open-universe.
The models were trained on the Voicebank-DEMAND dataset at 16 kHz.
The performance on the test split of Voicebank-DEMAND is given in the following table.
model | si-sdr | pesq-wb | stoi-ext | lsd | lps | OVRL | SIG | BAK |
---|---|---|---|---|---|---|---|---|
UNIVERSE++ | 18.624 | 3.017 | 0.864 | 4.867 | 0.937 | 3.200 | 3.489 | 4.040 |
UNIVERSE | 17.600 | 2.830 | 0.844 | 6.318 | 0.920 | 3.157 | 3.457 | 4.013 |
Usage
Start by installing open-universe
.
We use conda to simplify the installation.
git clone https://github.com/line/open-universe.git
cd open-universe
conda env create -f environment.yaml
conda activate open-universe
python -m pip install .
Then the models can be used as follows.
# UNIVERSE++ (default model)
python -m open_universe.bin.enhance <input/folder> <output/folder> \
--model line-corporation/open-universe:plusplus
# UNIVERSE
python -m open_universe.bin.enhance <input/folder> <output/folder> \
--model line-corporation/open-universe:original
Referencing open-universe and UNIVERSE++
If you use these models in your work, please consider citing the following paper.
@inproceedings{universepp,
authors={Scheibler, Robin and Fujita, Yusuke and Shirahata, Yuma and Komatsu, Tatsuya},
title={Universal Score-based Speech Enhancement with High Content Preservation},
booktitle={Proc. Interspeech 2024},
month=sep,
year=2024
}
Referencing UNIVERSE
@misc{universe,
authors={Serr\'a, Joan and Santiago, Pascual and Pons, Jordi and Araz, Oguz R. and Scaini, David},
title={Universal Speech Enhancement with Score-based Diffusion},
howpublished={arXiv:2206.03065},
month=sep,
year=2022
}
- Downloads last month
- 77
Evaluation results
- DNSMOS SIG on DEMANDself-reported3.493
- DNSMOS BAK on DEMANDself-reported4.042
- DNSMOS OVRL on DEMANDself-reported3.205
- PESQ on DEMANDself-reported3.017
- SI-SDR on DEMANDself-reported18.629