Speech Communication - Special issue on speech processing in adverse conditions
Speaker identification and verification using Gaussian mixture speaker models
Speech Communication
Voice activity detection based on a family of parametric distributions
Pattern Recognition Letters
Noise Robust Voice Activity Detection Based on Switching Kalman Filter
IEICE - Transactions on Information and Systems
Noise-Robust Voice Activity Detector Based on Hidden Semi-Markov Models
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Voice activity detection based on multiple statistical models
IEEE Transactions on Signal Processing - Part I
IEEE Transactions on Audio, Speech, and Language Processing
Evaluation of Objective Quality Measures for Speech Enhancement
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Image Processing
Hi-index | 0.10 |
To improve the performance of voice activity detector (VAD) in noisy environments, this paper concentrates on three critical aspects related to noise robustness including speech features, feature distributions and temporal dependence. Based on the statistic on TIMIT and NOIZEUS, Mel-frequency cepstrum coefficients (MFCCs) are selected as speech features, Gaussian Mixture distributions (GMD) are applied to associate the observations in MFCC domain with both speech and non-speech states, and Weibull and Gamma distributions are used to explicitly model noise and speech durations, respectively. To integrate these aspects into VAD, the hidden semi-Markov model (HSMM) as a generalized hidden Markov model (HMM) is introduced first. Then the VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge and modified forward variables of HSMM. We design a recursive way to efficiently calculate modified forward variables. Finally a series of experiments demonstrate: (1) the positive effect of different robustness-related schemes adopted in the proposed VAD; (2) better performance against the standard ITU-T G.729B, Adaptive MultiRate VAD phase 2 (AMR2), Advanced Front-end (AFE), HMM-based VAD and VAD using Laplacian-Gaussian model (LD-GD based VAD).