Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information

Authors:
A. O. Zhirkov;D. N. Kortchagine;A. S. Lukin;A. S. Krylov;Yu. M. Bayakovskii
Affiliations:
Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992 Russia kryl@cs.msu.su;Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992 Russia kryl@cs.msu.su;Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992 Russia kryl@cs.msu.su;Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992 Russia kryl@cs.msu.su;Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992 Russia kryl@cs.msu.su
Venue:
Programming and Computing Software
Year:
2003

Citing 2
Cited 0

Speech and Audio Signal Processing: Processing and Perception of Speech and Music

Speech and Audio Signal Processing: Processing and Perception of Speech and Music
Two-dimensional multi-resolution analysis of speech signals and its application to speech recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Currently, various time-frequency representations are often used for sound analysis. These representations, on the one hand, are convenient for visible sensation of sound by a human and, on the other hand, can be used for automatically analyzing sound pictures. In this paper, various methods for representation of sound as two-dimensional time-frequency vectors of a fixed dimension and their use for speech and speaker recognition problems are discussed. Probabilistic, distance-based, and neural-network methods for the recognition of these vectors by examples of separate words are considered. Numerical experiments showed that the best among them is the method based on a three-layer neural network, the short-time Fourier transform, and the two-dimensional wavelet transformation. For the speaker recognition problem, a distance-based recognition method employing the adaptive Hermite transform turned out the best among all.