Mathematical foundations of nonlinear, non-Gaussian, and time-varying digital speech signal processing

  • Authors:
  • Max A. Little

  • Affiliations:
  • Massachusetts Institute of Technology, Media Lab, Cambridge, MA

  • Venue:
  • NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classical digital speech signal processing assumes linearity, time-invariance, and Gaussian random variables (LTI-Gaussian theory). In this article, we address the suitability of these mathematical assumptions for realistic speech signals with respect to the biophysics of voice production, finding that the LTI-Gaussian approach has some important accuracy and computational efficiency shortcomings in both theory and practice. Next, we explore the consequences of relaxing the assumptions of time-invariance and Gaussianity, which admits certain potentially useful techniques, including wavelet and sparse representations in computational harmonic analysis, but rules out Fourier analysis and convolution, which could be a disadvantage. Then, we focus on methods that retain time-invariance alone, which admits techniques from nonlinear time series analysis and Markov chains, both of which have shown promise in biomedical applications. We highlight recent examples of non-LTI-Gaussian digital speech signal processing in the literature, and draw conclusions for future prospects in this area.