Audio Separator Tests

This directory contains tests for the audio-separator project.

Audio Validation

The integration tests now include validation of output audio files by comparing waveform and spectrogram images with reference images. This helps ensure that the audio separation results remain consistent across different runs and code changes.

How It Works

Reference waveform and spectrogram images are generated from expected output files
During test execution, the same images are generated for the actual output files
The images are compared using Structural Similarity Index (SSIM) to measure perceptual similarity
If the images differ significantly, the test fails, indicating a change in the audio output

Image Comparison with SSIM

The tests use Structural Similarity Index Measure (SSIM) to compare images, which is more robust than pixel-by-pixel comparison:

SSIM considers structural information in the images
It's more resilient to small spatial shifts or offsets
It better matches human perception of image similarity
It works well across different environments (local machines vs CI servers)

The comparison uses a minimum similarity threshold (0.0-1.0):

Higher values (closer to 1.0) require images to be more similar
Lower values (closer to 0.0) are more permissive
A value of 0.99 requires 99% similarity between images
A value of 0.0 would consider any images to match

The default threshold is set to 0.999, which is quite strict. However, model-specific thresholds can be set to accommodate different models' behavior.

Model-Specific Thresholds

Some models inherently produce slightly different outputs on different runs, even with the same input. To accommodate these models, you can set model-specific thresholds in the MODEL_SIMILARITY_THRESHOLDS dictionary:

MODEL_SIMILARITY_THRESHOLDS = {
    "htdemucs_6s.yaml": 0.990,  # Demucs models need a lower threshold
    # Add other models that need custom thresholds here
}

This allows you to maintain a high threshold for most models while being more flexible with models that naturally exhibit more variation.

Generating Reference Images

To generate or update the reference images, use the script provided:

python tests/integration/generate_reference_images.py

This script will create waveform and spectrogram images for all expected output files and store them in the tests/inputs/reference directory.

Skipping Validation

If you need to skip the audio validation (e.g., when you're intentionally changing the output), you can set the environment variable SKIP_AUDIO_VALIDATION=1:

SKIP_AUDIO_VALIDATION=1 pytest tests/integration/test_cli_integration.py

Adding New Models

When adding a new model to the tests:

Add the model and its expected output files to the MODEL_PARAMS list in test_cli_integration.py
Run the integration test to generate the output files
Run the generate_reference_images.py script to create the reference images
Run the tests again to validate the output files
If necessary, add a custom similarity threshold for the new model in MODEL_SIMILARITY_THRESHOLDS

Debugging

To see detailed validation results, run pytest with the -sv flag:

pytest tests/integration/test_cli_integration.py -sv

This will show the similarity scores for each comparison and whether they passed or failed.

Running Tests

To run all tests:

pytest

To run specific tests:

pytest tests/unit/
pytest tests/integration/

To run with coverage:

pytest --cov=audio_separator