Voice activity detection based on adjustable linear prediction and GARCH models

Authors:
Hiroko Kato Solvang;Kentaro Ishizuka;Masakiyo Fujimoto
Affiliations:
Department of Genetics, Institute for Cancer Research, Norwegian Radium Hospital, Rikshospitalet University Hospital, Montebello, 0310 Oslo, Norway and Department of Biostatistics, Institute of Ba ...;NTT Communication Science Laboratories, NTT Corporation, 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan;NTT Communication Science Laboratories, NTT Corporation, 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan
Venue:
Speech Communication
Year:
2008

Citing 4
Cited 3

Study of a voice activity detector and its influence on a noise reduction system

Speech Communication
Towards improving speech detection robustness for speech recognition in adverse conditions

Speech Communication
AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition

IEICE - Transactions on Information and Systems
A Soft Voice Activity Detection Using GARCH Filter and Variance Gamma Distribution

IEEE Transactions on Audio, Speech, and Language Processing

Noise robust voice activity detection based on periodic to aperiodic component ratio

Speech Communication
Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection

Speech Communication
Two dimensional noncausal AR-ARCH model: Stationary conditions, parameter estimation and its application to anomaly detection

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a method for voice activity detection (VAD) that employs a class of the Autoregressive-Generalized Autoregressive Conditional Heteroskedasticity (AR-GARCH) model. As regards correlated speech signals, we represent the AR part of the AR-GARCH model with a state-space to obtain the appropriate linear prediction error series. By applying the GARCH model to the residual, we estimate the conditional variance sequences corresponding to the voice activity parts. To detect voice activity, we establish an appropriate threshold for the conditional variance sequences. To confirm the performance of our proposed VAD method, we conduct experiments using speech signals with real background noise (signal-to-noise ratios (SNRs) of 10, 5 and 0dB) of an airport and a street. Furthermore, using receiver operating characteristics curves and equal error rates, we compare our results with those of previous standardized VAD algorithms (ITU-T G.729B, ETSI ES 202 050, and ETSI EN 301 708) as well as recently developed methods (VAD with long-term spectral divergence, likelihood ratio tests, and higher-order statistics for VAD). In terms of the signals with background noise at an SNR of 0dB, the experimental results show a significant performance improvement compared with standardized VAD algorithms and more than 10% improvement compared with recently developed VAD methods.