Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information

  • Authors:
  • A. O. Zhirkov;D. N. Kortchagine;A. S. Lukin;A. S. Krylov;Yu. M. Bayakovskii

  • Affiliations:
  • Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992 Russia kryl@cs.msu.su;Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992 Russia kryl@cs.msu.su;Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992 Russia kryl@cs.msu.su;Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992 Russia kryl@cs.msu.su;Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob'evy gory, Moscow, 119992 Russia kryl@cs.msu.su

  • Venue:
  • Programming and Computing Software
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Currently, various time-frequency representations are often used for sound analysis. These representations, on the one hand, are convenient for visible sensation of sound by a human and, on the other hand, can be used for automatically analyzing sound pictures. In this paper, various methods for representation of sound as two-dimensional time-frequency vectors of a fixed dimension and their use for speech and speaker recognition problems are discussed. Probabilistic, distance-based, and neural-network methods for the recognition of these vectors by examples of separate words are considered. Numerical experiments showed that the best among them is the method based on a three-layer neural network, the short-time Fourier transform, and the two-dimensional wavelet transformation. For the speaker recognition problem, a distance-based recognition method employing the adaptive Hermite transform turned out the best among all.