Noise Robust Voice Activity Detection Based on Switching Kalman Filter

Authors:
Masakiyo Fujimoto;Kentaro Ishizuka
Affiliations:
-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2008

Citing 3
Cited 4

Kalman filtering theory

Kalman filtering theory
AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition

IEICE - Transactions on Information and Systems
A tutorial on particle filters for online nonlinear/non-GaussianBayesian tracking

IEEE Transactions on Signal Processing

Noise robust voice activity detection based on periodic to aperiodic component ratio

Speech Communication
Adaptive V/UV speech detection based on characterization of background noise

EURASIP Journal on Audio, Speech, and Music Processing
Speech activity detection for multi-party conversation analyses based on likelihood ratio test on spatial magnitude

IEEE Transactions on Audio, Speech, and Language Processing
An improved noise-robust voice activity detector based on hidden semi-Markov models

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of voice activity detection (VAD) in noisy environments. The VAD method proposed in this paper is based on a statistical model approach, and estimates statistical models sequentially without a priori knowledge of noise. Namely, the proposed method constructs a clean speech / silence state transition model beforehand, and sequentially adapts the model to the noisy environment by using a switching Kalman filter when a signal is observed. In this paper, we carried out two evaluations. In the first, we observed that the proposed method significantly outperforms conventional methods as regards voice activity detection accuracy in simulated noise environments. Second, we evaluated the proposed method on a VAD evaluation framework, CENSREC-1-C. The evaluation results revealed that the proposed method significantly outperforms the baseline results of CENSREC-1-C as regards VAD accuracy in real environments. In addition, we confirmed that the proposed method helps to improve the accuracy of concatenated speech recognition in real environments.