Speech waveform compression using robust adaptive voice activity detection for nonstationary noise

  • Authors:
  • Waheeduddin Q. Syed;Hsiao-Chun Wu

  • Affiliations:
  • Communications and Signal Processing Laboratory, Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA;Communications and Signal Processing Laboratory, Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA

  • Venue:
  • EURASIP Journal on Audio, Speech, and Music Processing - Atypical Speech
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The voice activity detection (VAD) is crucial in all kinds of speech applications. However, almost all existing VAD algorithms suffer from the nonstationarity of both speech and noise. To combat this difficulty, we propose a new voice activity detector, which is based on the Mel-energy features and an adaptive threshold related to the signal-to-noise ratio (SNR) estimates. In this paper, we first justify the robustness of the Bayes classifier using the Mel-energy features over that using the Fourier spectral features in various noise environments. Then, we design an algorithm using the dynamic Mel-energy estimator and the adaptive threshold, which depends on the SNR estimates. In addition, a realignment scheme is incorporated to correct the sparse-and-spurious noise estimates. Numerous simulations are carried out to evaluate the performance of our proposed VAD method and the comparisons are made with a couple of existing representative schemes, namely, the VAD using the likelihood ratio test with Fourier spectral energy features and that based on the enhanced time-frequency parameters. Three types of noises, namely, white noise (stationary), babble noise (nonstationary), and vehicular noise (nonstationary) were artificially added by the computer for our experiments. As a result, our proposed VAD algorithm significantly outperforms other existing methods as illustrated by the corresponding receiver operating characteristics (ROC) curves. Finally, we demonstrate one of the major applications, namely, speech waveform compression associated with our new robust VAD scheme and quantify the effectiveness in terms of compression efficiency.