A comparative study of signal representations and classification techniques for speech recognition

Authors:
Hong C. Leung;Benjamin Chigier;James R. Glass
Affiliations:
Artificial Intelligence Laboratory, NYNEX Science and Technology, Inc., White Plains, New York;Artificial Intelligence Laboratory, NYNEX Science and Technology, Inc., White Plains, New York;Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
Venue:
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Year:
1993

Citing 4
Cited 0

Collection and analysis of data from real users: implications for speech recognition/understanding systems

HLT '91 Proceedings of the workshop on Speech and Natural Language
Phonetic classification on wide-band and telephone quality speech

HLT '91 Proceedings of the workshop on Speech and Natural Language
Signal representation comparison for phonetic classification

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Speech recognition using stochastic segment neural networks

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate the interactions of two important sets of techniques in speech recognition: signal representation and classification. In addition, in order to quantify the effect of the telephone network, we perform our experiments on both wide-band and telephonequality speech. The spectral and cepstral signal processing techniques we study fall into a few major categories that are based on: Fourier analyses, linear prediction, and auditory processing. The classification techniques that we examine are Gaussian, mixture Gaussians, and the multi-layer perceptron (MLP). Our results indicate that the MLP consistently produces lower error rates than the other two classifiers. When averaged across all three classifiers, the Bark auditory spectral coefficients (BASC) produce the lowest phonetic classification error rates. When evaluated in our stochastic segment framework using the MLP, BASC also produces the lowest word error rate.