|
# AudioCraft objective metrics |
|
|
|
In addition to training losses, AudioCraft provides a set of objective metrics |
|
for audio synthesis and audio generation. As these metrics may require |
|
extra dependencies and can be costly to train, they are often disabled by default. |
|
This section provides guidance for setting up and using these metrics in |
|
the AudioCraft training pipelines. |
|
|
|
## Available metrics |
|
|
|
### Audio synthesis quality metrics |
|
|
|
#### SI-SNR |
|
|
|
We provide an implementation of the Scale-Invariant Signal-to-Noise Ratio in PyTorch. |
|
No specific requirement is needed for this metric. Please activate the metric at the |
|
evaluation stage with the appropriate flag: |
|
|
|
```shell |
|
dora run <...> evaluate.metrics.sisnr=true |
|
``` |
|
|
|
#### ViSQOL |
|
|
|
We provide a Python wrapper around the ViSQOL [official implementation](https://github.com/google/visqol) |
|
to conveniently run ViSQOL within the training pipelines. |
|
|
|
One must specify the path to the ViSQOL installation through the configuration in order |
|
to enable ViSQOL computations in AudioCraft: |
|
|
|
```shell |
|
# the first parameter is used to activate visqol computation while the second specify |
|
# the path to visqol's library to be used by our python wrapper |
|
dora run <...> evaluate.metrics.visqol=true metrics.visqol.bin=<path_to_visqol> |
|
``` |
|
|
|
See an example grid: [Compression with ViSQOL](../audiocraft/grids/compression/encodec_musicgen_32khz.py) |
|
|
|
To learn more about ViSQOL and how to build ViSQOL binary using bazel, please refer to the |
|
instructions available in the [open source repository](https://github.com/google/visqol). |
|
|
|
### Audio generation metrics |
|
|
|
#### Frechet Audio Distance |
|
|
|
Similarly to ViSQOL, we use a Python wrapper around the Frechet Audio Distance |
|
[official implementation](https://github.com/google-research/google-research/tree/master/frechet_audio_distance) |
|
in TensorFlow. |
|
|
|
Note that we had to make several changes to the actual code in order to make it work. |
|
Please refer to the [FrechetAudioDistanceMetric](../audiocraft/metrics/fad.py) class documentation |
|
for more details. We do not plan to provide further support in obtaining a working setup for the |
|
Frechet Audio Distance at this stage. |
|
|
|
```shell |
|
# the first parameter is used to activate FAD metric computation while the second specify |
|
# the path to FAD library to be used by our python wrapper |
|
dora run <...> evaluate.metrics.fad=true metrics.fad.bin=<path_to_google_research_repository> |
|
``` |
|
|
|
See an example grid: [Evaluation with FAD](../audiocraft/grids/musicgen/musicgen_pretrained_32khz_eval.py) |
|
|
|
#### Kullback-Leibler Divergence |
|
|
|
We provide a PyTorch implementation of the Kullback-Leibler Divergence computed over the probabilities |
|
of the labels obtained by a state-of-the-art audio classifier. We provide our implementation of the KLD |
|
using the [PaSST classifier](https://github.com/kkoutini/PaSST). |
|
|
|
In order to use the KLD metric over PaSST, you must install the PaSST library as an extra dependency: |
|
```shell |
|
pip install 'git+https://github.com/kkoutini/[email protected]#egg=hear21passt' |
|
``` |
|
|
|
Then similarly, you can use the metric activating the corresponding flag: |
|
|
|
```shell |
|
# one could extend the kld metric with additional audio classifier models that can then be picked through the configuration |
|
dora run <...> evaluate.metrics.kld=true metrics.kld.model=passt |
|
``` |
|
|
|
#### Text consistency |
|
|
|
We provide a text-consistency metric, similarly to the MuLan Cycle Consistency from |
|
[MusicLM](https://arxiv.org/pdf/2301.11325.pdf) or the CLAP score used in |
|
[Make-An-Audio](https://arxiv.org/pdf/2301.12661v1.pdf). |
|
More specifically, we provide a PyTorch implementation of a Text consistency metric |
|
relying on a pre-trained [Contrastive Language-Audio Pretraining (CLAP)](https://github.com/LAION-AI/CLAP). |
|
|
|
Please install the CLAP library as an extra dependency prior to using the metric: |
|
```shell |
|
pip install laion_clap |
|
``` |
|
|
|
Then similarly, you can use the metric activating the corresponding flag: |
|
|
|
```shell |
|
# one could extend the text consistency metric with additional audio classifier models that can then be picked through the configuration |
|
dora run ... evaluate.metrics.text_consistency=true metrics.text_consistency.model=clap |
|
``` |
|
|
|
Note that the text consistency metric based on CLAP will require the CLAP checkpoint to be |
|
provided in the configuration. |
|
|
|
#### Chroma cosine similarity |
|
|
|
Finally, as introduced in MusicGen, we provide a Chroma Cosine Similarity metric in PyTorch. |
|
No specific requirement is needed for this metric. Please activate the metric at the |
|
evaluation stage with the appropriate flag: |
|
|
|
```shell |
|
dora run ... evaluate.metrics.chroma_cosine=true |
|
``` |
|
|
|
#### Comparing against reconstructed audio |
|
|
|
For all the above audio generation metrics, we offer the option to compute the metric on the reconstructed audio |
|
fed in EnCodec instead of the generated sample using the flag `<metric>.use_gt=true`. |
|
|
|
## Example usage |
|
|
|
You will find example of configuration for the different metrics introduced above in: |
|
* The [musicgen's default solver](../config/solver/musicgen/default.yaml) for all audio generation metrics |
|
* The [compression's default solver](../config/solver/compression/default.yaml) for all audio synthesis metrics |
|
|
|
Similarly, we provide different examples in our grids: |
|
* [Evaluation with ViSQOL](../audiocraft/grids/compression/encodec_musicgen_32khz.py) |
|
* [Evaluation with FAD and others](../audiocraft/grids/musicgen/musicgen_pretrained_32khz_eval.py) |
|
|