Papers
arxiv:2010.13886

MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection

Published on Oct 26, 2020
Authors:
,
,

Abstract

MarbleNet, an end-to-end voice activity detection model using a deep residual network with 1D time-channel separable convoolutions, achieves similar performance to state-of-the-art models with fewer parameters.

AI-generated summary

We present MarbleNet, an end-to-end neural network for Voice Activity Detection (VAD). MarbleNet is a deep residual network composed from blocks of 1D time-channel separable convolution, batch-normalization, ReLU and dropout layers. When compared to a state-of-the-art VAD model, MarbleNet is able to achieve similar performance with roughly 1/10-th the parameter cost. We further conduct extensive ablation studies on different training methods and choices of parameters in order to study the robustness of MarbleNet in real-world VAD tasks.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2010.13886 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2010.13886 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.