Container Template for SoundsRight Subnet Miners

This repository contains a contanierized version of SGMSE+ and serves as a tutorial for miners to format their models on Bittensor's SoundsRight Subnet. The branches DENOISING_16000HZ and DEREVERBERATION_16000HZ contain SGMSE fitted with the approrpriate checkpoints for denoising and dereverberation tasks at 16kHz, respectively.

This container has only been tested with Ubuntu 24.04 and CUDA 12.6. It may run on other configurations, but it is not guaranteed.

To run the container, first configure NVIDIA Container Toolkit and generate a CDI specification. Follow the instructions to download the NVIDIA Container Toolkit with Apt.

Next, follow the instructions for generating a CDI specification.

Verify that the CDI specification was done correctly with:

$ nvidia-ctk cdi list

You should see this in your output:

nvidia.com/gpu=all
nvidia.com/gpu=0

If you are running podman as root, run the following command to start the container:

Run the container with:

podman build -t modelapi . && podman run -d --device nvidia.com/gpu=all --user root --name modelapi -p 6500:6500 modelapi

Access logs with:

podman logs -f modelapi

If you are running the container rootless, there are a few more changes to make:

First, modify /etc/nvidia-container-runtime/config.toml and set the following parameters:

[nvidia-container-cli]
no-cgroups = true

[nvidia-container-runtime]
debug = "/tmp/nvidia-container-runtime.log"

You can also run the following command to achieve the same result:

$ sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place

Run the container with:

podman build -t modelapi . && podman run -d --device nvidia.com/gpu=all --volume /usr/local/cuda-12.6:/usr/local/cuda-12.6 --user 10002:10002 --name modelapi -p 6500:6500 modelapi

Access logs with:

podman logs -f modelapi

Running the container will spin up an API with the following endpoints:

/status/ : Communicates API status
/prepare/ : Download model checkpoint and initialize model
/upload-audio/ : Upload audio files, save to noisy audio directory
/enhance/ : Initialize model, enhance audio files, save to enhanced audio directory
/download-enhanced/ : Download enhanced audio files

By default the API will use host 0.0.0.0 and port 6500.

References

Welker, Simon; Richter, Julius; Gerkmann, Timo
Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain.
Proceedings of Interspeech 2022, 2022, pp. 2928–2932.
DOI: 10.21437/Interspeech.2022-10653
Richter, Julius; Welker, Simon; Lemercier, Jean-Marie; Lay, Bunlong; Gerkmann, Timo
Speech Enhancement and Dereverberation with Diffusion-based Generative Models.
IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 31, 2023, pp. 2351–2364.
DOI: 10.1109/TASLP.2023.3285241
Richter, Julius; Wu, Yi-Chiao; Krenn, Steven; Welker, Simon; Lay, Bunlong; Watanabe, Shinjii; Richard, Alexander; Gerkmann, Timo
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation.
Proceedings of ISCA Interspeech, 2024, pp. 4873–4877.