An improved noise-robust voice activity detector based on hidden semi-Markov models

Authors:
Yuan Liang;Xianglong Liu;Yihua Lou;Baosong Shan
Affiliations:
State Key Laboratory of Software Development Environment, Beihang University, China;State Key Laboratory of Software Development Environment, Beihang University, China;State Key Laboratory of Software Development Environment, Beihang University, China;School of Mathematics and Systems Science, Beihang University, China
Venue:
Pattern Recognition Letters
Year:
2011

Citing 10
Cited 0

Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Speech Communication - Special issue on speech processing in adverse conditions
Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Voice activity detection based on a family of parametric distributions

Pattern Recognition Letters
Noise Robust Voice Activity Detection Based on Switching Kalman Filter

IEICE - Transactions on Information and Systems
Noise-Robust Voice Activity Detector Based on Hidden Semi-Markov Models

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Voice activity detection based on multiple statistical models

IEEE Transactions on Signal Processing - Part I
Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold

IEEE Transactions on Audio, Speech, and Language Processing
Evaluation of Objective Quality Measures for Speech Enhancement

IEEE Transactions on Audio, Speech, and Language Processing
ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications

IEEE Communications Magazine
Variable duration hidden Markov model and morphological segmentation for handwritten word recognition

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.10

Visualization

Abstract

To improve the performance of voice activity detector (VAD) in noisy environments, this paper concentrates on three critical aspects related to noise robustness including speech features, feature distributions and temporal dependence. Based on the statistic on TIMIT and NOIZEUS, Mel-frequency cepstrum coefficients (MFCCs) are selected as speech features, Gaussian Mixture distributions (GMD) are applied to associate the observations in MFCC domain with both speech and non-speech states, and Weibull and Gamma distributions are used to explicitly model noise and speech durations, respectively. To integrate these aspects into VAD, the hidden semi-Markov model (HSMM) as a generalized hidden Markov model (HMM) is introduced first. Then the VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge and modified forward variables of HSMM. We design a recursive way to efficiently calculate modified forward variables. Finally a series of experiments demonstrate: (1) the positive effect of different robustness-related schemes adopted in the proposed VAD; (2) better performance against the standard ITU-T G.729B, Adaptive MultiRate VAD phase 2 (AMR2), Advanced Front-end (AFE), HMM-based VAD and VAD using Laplacian-Gaussian model (LD-GD based VAD).