Spleeter-HT-Demucs-Stem-Separation-2025 / stem_separation_spleeter.py
ahkd's picture
test spleeter
b911552
# -*- coding: utf-8 -*-
"""Stem_Separation_Spleeter.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1ZanGdGmndmOSa0q2arjO1hT5p72n56rm
# Audio Stem Separation with Spleeter
Audio stem separation is the task of isolating individual components (stems) from a mixed audio signal, such as vocals, bass, drums, and other instrumental parts. This technology has various applications, including music production, remixing, karaoke, and forensic audio analysis. Spleeter, developed by Deezer Research, is a powerful open-source tool that leverages deep learning to perform this separation efficiently and effectively.
This Jupyter notebook demonstrates how to use Spleeter, an open-source library developed by Deezer Research for audio stem separation. The notebook provides a step-by-step guide to:
1. Install the Spleeter library
2. Separate an audio file into different stems (vocal, bass, drums, and other)
3. Visualize the separated audio sources using waveforms and spectrograms
Additionally it is demonstrated how to use Spleeter from the command-line.
The code is designed to be adaptable for both Google Colab and local Python environments, with clear instructions for running the script in different settings.
View Spleeter on GitHub: https://github.com/deezer/spleeter
## Instructions to run the code
Let's begin by installing spleeter.
I had problems running this in my local python 3.11 environment because my numpy and librosa versions were incompatible. If this problem happens to you as well, I would recommend using COLAB instead or using an older python version (for example 3.8).
"""
!pip install spleeter
!pip install librosa
!pip install matplotlib>=3.5
"""Next, we import all the required packages."""
from spleeter.separator import Separator
from spleeter.audio.adapter import AudioAdapter
from IPython.display import Audio
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
"""Before running the next cell, please insert a .mp3 / .wav file into this directory and paste the name of it in the variable *INPUT_FILE* in the code.
If you are using COLAB, you can find this option by selecting files at the verical bar on the left.
With the current settings, the model separates vocals, bass, drums and accomplishment from the input-file.
However, you can change this by setting *SPLEETER_MODEL* to one of the following:
- spleeter:2stems
- separates only Vocals and accomplishment
- spleeter:4stems
- separates vocals, bass, drums and accomplishment
- spleeter:5stems
- separates vocals, bass, drums, piano and accomplishment
Lets start by separating our first stems.
Running this cell might take some time, depending on your setup, the audio file and the selected model
"""
# insert your file name here
INPUT_FILE = 'example_for_demo.mp3'
INPUT_FILENAME = INPUT_FILE.split('.')[0]
OUTPUT_DIR = 'outputs'
SUPPORTED_EXTENSIONS = ('.wav', '.mp3')
SAMPLE_RATE = 44100
SPLEETER_MODEL = 'spleeter:5stems' # You might want to change this to 'spleeter:2stems' or 'spleeter:4stems' for different models
# Initialize separator
separator = Separator(SPLEETER_MODEL)
# Load audio
audio_loader = AudioAdapter.default()
waveform, _ = audio_loader.load(INPUT_FILE, sample_rate=SAMPLE_RATE)
# Perform the separation and save the output
print("Separating audio sources...")
# prediction = separator.separate(waveform) # separating the audio, without saving it
separator.separate_to_file(INPUT_FILE, OUTPUT_DIR)
"""## Displaying Audio, Spectrogram and Waveform
Lets create two helper-functions that plot the waveform and the spectrogram of an audio file.
By using different librosa methods, we can easily create such plots.
Finally we create a method that calls these two methods and additionally displays the audio, so that it can be played directly in this notebook.
"""
# Visualization of Waveforms and Spectrograms
# Plot waveform
def plot_waveform(waveform, sr, title='Waveform'):
plt.figure(figsize=(15, 2))
librosa.display.waveshow(waveform, sr=sr)
plt.title(title)
plt.tight_layout()
plt.show()
# Plot spectrogram
def plot_spectrogram(signal, sr, title='Spectrogram'):
D = librosa.amplitude_to_db(np.abs(librosa.stft(signal)), ref=np.max)
plt.figure(figsize=(15, 2))
librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='log')
plt.title(title)
plt.colorbar(format='%+2.0f dB')
plt.tight_layout()
plt.show()
def display_audio_and_plots(fileName):
display(Audio(fileName))
print(fileName)
y, sr = librosa.load(fileName)
plot_waveform(y, sr, f'Waveform of {fileName}')
plot_spectrogram(y, sr, f'Spectrogram of {fileName}')
print('---' * 50)
"""Now we can use this method and display all the relevant formats for the original song and each separated stem.
Here you should be able to notice some characteristics (depending on the song that you picked):
- The bass should be located at the lower frequencies of the spectrogram.
- The spectogram of drums often consists of vertical lines, which means, that they occur in multiple frequencies at the same time and that their hits are of very short time (which is typcal for drums).
- Vocals often display as horizontal lines in the Spectrogram (depending on the Genre). This is because their tones are often held longer, compared to other instruments.
In the following cell, you can insert the path of any wav/mp3 file and display the different plots.
You might have to comment some calls of display_audio_and_plots() depending on the model that you've chosen.
"""
display_audio_and_plots(INPUT_FILE)
# if you are running this in a local environment, you can use the following code to display the audio
display_audio_and_plots(f'{OUTPUT_DIR}/{INPUT_FILENAME}/other.wav')
display_audio_and_plots(f'{OUTPUT_DIR}/{INPUT_FILENAME}/vocals.wav')
display_audio_and_plots(f'{OUTPUT_DIR}/{INPUT_FILENAME}/bass.wav')
display_audio_and_plots(f'{OUTPUT_DIR}/{INPUT_FILENAME}/drums.wav')
display_audio_and_plots(f'{OUTPUT_DIR}/{INPUT_FILENAME}/piano.wav')
# if you are running this in COLAB, you can use the following code to display the audio
# display_audio_and_plots(f'/content/{OUTPUT_DIR}/{INPUT_FILENAME}/other.wav')
# display_audio_and_plots(f'/content/{OUTPUT_DIR}/{INPUT_FILENAME}/vocals.wav')
# display_audio_and_plots(f'/content/{OUTPUT_DIR}/{INPUT_FILENAME}/bass.wav')
# display_audio_and_plots(f'/content/{OUTPUT_DIR}/{INPUT_FILENAME}/drums.wav')
# display_audio_and_plots(f'/content/{OUTPUT_DIR}/{INPUT_FILENAME}/piano.wav')
"""## Running Spleeter on Command Line
The following Cell demonstrates how to use Spleeter on the command-line. You can execute the cell here in this Notebook, or you copy the command into your command line. Note that if you run this directly on the command-line, you have to remove the "!" before the command.
Again, you can choose between different Models here. Either 2, 4 or 5 stems can be separated.
To run Spleeter on comman-line, you have to provide a name of the output directory (here: audio_output), a model and an audio file (here: example_for_demo.mp3).
"""
# 2 stems
!spleeter separate -o audio_output example_for_demo.mp3
# 4 stems
# !spleeter separate -o audio_output -p spleeter:4stems example_for_demo.mp3
# 5 stems
# !spleeter separate -o audio_output -p spleeter:5stems example_for_demo.mp3