An improved noise-robust voice activity detector based on hidden semi-Markov models

  • Authors:
  • Yuan Liang;Xianglong Liu;Yihua Lou;Baosong Shan

  • Affiliations:
  • State Key Laboratory of Software Development Environment, Beihang University, China;State Key Laboratory of Software Development Environment, Beihang University, China;State Key Laboratory of Software Development Environment, Beihang University, China;School of Mathematics and Systems Science, Beihang University, China

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2011

Quantified Score

Hi-index 0.10

Visualization

Abstract

To improve the performance of voice activity detector (VAD) in noisy environments, this paper concentrates on three critical aspects related to noise robustness including speech features, feature distributions and temporal dependence. Based on the statistic on TIMIT and NOIZEUS, Mel-frequency cepstrum coefficients (MFCCs) are selected as speech features, Gaussian Mixture distributions (GMD) are applied to associate the observations in MFCC domain with both speech and non-speech states, and Weibull and Gamma distributions are used to explicitly model noise and speech durations, respectively. To integrate these aspects into VAD, the hidden semi-Markov model (HSMM) as a generalized hidden Markov model (HMM) is introduced first. Then the VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge and modified forward variables of HSMM. We design a recursive way to efficiently calculate modified forward variables. Finally a series of experiments demonstrate: (1) the positive effect of different robustness-related schemes adopted in the proposed VAD; (2) better performance against the standard ITU-T G.729B, Adaptive MultiRate VAD phase 2 (AMR2), Advanced Front-end (AFE), HMM-based VAD and VAD using Laplacian-Gaussian model (LD-GD based VAD).