A comparative study of signal representations and classification techniques for speech recognition

  • Authors:
  • Hong C. Leung;Benjamin Chigier;James R. Glass

  • Affiliations:
  • Artificial Intelligence Laboratory, NYNEX Science and Technology, Inc., White Plains, New York;Artificial Intelligence Laboratory, NYNEX Science and Technology, Inc., White Plains, New York;Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts

  • Venue:
  • ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we investigate the interactions of two important sets of techniques in speech recognition: signal representation and classification. In addition, in order to quantify the effect of the telephone network, we perform our experiments on both wide-band and telephonequality speech. The spectral and cepstral signal processing techniques we study fall into a few major categories that are based on: Fourier analyses, linear prediction, and auditory processing. The classification techniques that we examine are Gaussian, mixture Gaussians, and the multi-layer perceptron (MLP). Our results indicate that the MLP consistently produces lower error rates than the other two classifiers. When averaged across all three classifiers, the Bark auditory spectral coefficients (BASC) produce the lowest phonetic classification error rates. When evaluated in our stochastic segment framework using the MLP, BASC also produces the lowest word error rate.