Papers
arxiv:2303.03005

Scaling strategies for on-device low-complexity source separation with Conv-Tasnet

Published on Mar 6, 2023
Authors:
,
,
,

Abstract

Recently, several very effective neural approaches for single-channel speech separation have been presented in the literature. However, due to the size and complexity of these models, their use on low-resource devices, e.g. for hearing aids, and earphones, is still a challenge and established solutions are not available yet. Although approaches based on either pruning or compressing neural models have been proposed, the design of a model architecture suitable for a certain application domain often requires heuristic procedures not easily portable to different low-resource platforms. Given the modular nature of the well-known Conv-Tasnet speech separation architecture, in this paper we consider three parameters that directly control the overall size of the model, namely: the number of residual blocks, the number of repetitions of the separation blocks and the number of channels in the depth-wise convolutions, and experimentally evaluate how they affect the speech separation performance. In particular, experiments carried out on the Libri2Mix show that the number of dilated 1D-Conv blocks is the most critical parameter and that the usage of extra-dilation in the residual blocks allows reducing the performance drop.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2303.03005 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2303.03005 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2303.03005 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.