Container Template for SoundsRight Subnet Miners
This repository contains a contanierized version of SGMSE+ and serves as a tutorial for miners to format their models on Bittensor's SoundsRight Subnet. The branches DENOISING_16000HZ
and DEREVERBERATION_16000HZ
contain SGMSE fitted with the approrpriate checkpoints for denoising and dereverberation tasks at 16kHz, respectively.
This container has only been tested with Ubuntu 24.04 and CUDA 12.6. It may run on other configurations, but it is not guaranteed.
To run the container, first configure NVIDIA Container Toolkit and generate a CDI specification. Follow the instructions to download the NVIDIA Container Toolkit with Apt.
Next, follow the instructions for generating a CDI specification.
Verify that the CDI specification was done correctly with:
$ nvidia-ctk cdi list
You should see this in your output:
nvidia.com/gpu=all
nvidia.com/gpu=0
If you are running podman as root, run the following command to start the container:
Run the container with:
podman build -t modelapi . && podman run -d --device nvidia.com/gpu=all --user root --name modelapi -p 6500:6500 modelapi
Access logs with:
podman logs -f modelapi
If you are running the container rootless, there are a few more changes to make:
First, modify /etc/nvidia-container-runtime/config.toml
and set the following parameters:
[nvidia-container-cli]
no-cgroups = true
[nvidia-container-runtime]
debug = "/tmp/nvidia-container-runtime.log"
You can also run the following command to achieve the same result:
$ sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
Run the container with:
podman build -t modelapi . && podman run -d --device nvidia.com/gpu=all --volume /usr/local/cuda-12.6:/usr/local/cuda-12.6 --user 10002:10002 --name modelapi -p 6500:6500 modelapi
Access logs with:
podman logs -f modelapi
Running the container will spin up an API with the following endpoints:
/status/
: Communicates API status/prepare/
: Download model checkpoint and initialize model/upload-audio/
: Upload audio files, save to noisy audio directory/enhance/
: Initialize model, enhance audio files, save to enhanced audio directory/download-enhanced/
: Download enhanced audio files
By default the API will use host 0.0.0.0
and port 6500
.
References
Welker, Simon; Richter, Julius; Gerkmann, Timo
Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain.
Proceedings of Interspeech 2022, 2022, pp. 2928โ2932.
DOI: 10.21437/Interspeech.2022-10653Richter, Julius; Welker, Simon; Lemercier, Jean-Marie; Lay, Bunlong; Gerkmann, Timo
Speech Enhancement and Dereverberation with Diffusion-based Generative Models.
IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 31, 2023, pp. 2351โ2364.
DOI: 10.1109/TASLP.2023.3285241Richter, Julius; Wu, Yi-Chiao; Krenn, Steven; Welker, Simon; Lay, Bunlong; Watanabe, Shinjii; Richard, Alexander; Gerkmann, Timo
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation.
Proceedings of ISCA Interspeech, 2024, pp. 4873โ4877.