Some coding properties of speech

  • Authors:
  • V. N. Sorokin

  • Affiliations:
  • Institute for Information Transmission Problems, Russian Academy of Sciences, Bolshoy Karetny 19, 101447 Moscow, Russia

  • Venue:
  • Speech Communication
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Some important properties of speech are considered from the point of view of the theory of error correcting codes. It has found experimentally that the properties of Russian words encoded in terms of phonemes are largely similar to the properties of the so-called prefix codes. In the prefix codes, no code word is a prefix of another word. According to the theory of coding, for any prefix code exists an algorithm of unambiguous decoding where no pause or special symbol-delimeter separates code words. Apparently, word segmentation in the continuous speech signal is provided mainly by the use of the prefix property. Phoneme probability in Russian follows the Mandelbrot law. This finding is evidence in favour of the assumption that the probability is determined by the "complexity" or "expense" of phoneme generation. Speech recognition for a large vocabulary requires much time for access to word templates. Thus, a preliminary sorting of the templates is necessary to restrict a number of candidates for final recognition. The preliminary sorting can be executed by means of word coding by few phonemic cues. Auditory experiments with speech masked by white noise have revealed the most reliable cues. These cues are "vowel, voiced, nasal, fricative". About 150 templates were left after the fast sorting procedure for approximately 100,000 templates in the vocabulary of 10,000 of the most frequent English words. Speech recognition rate obtained by an automatic recognition system must be compared with potentially achievable rate. The potential rate of word recognition for various S/N ratios can be computed with the use of methods developed in the theory of coding. It can be argued that an optimal machine for automatic speech recognition should find robust the same cues which humans find robust. The potential rate for words encoded in terms of independent distinctive features is closer to the subjective reliability of word perception than the rate for words encoded by phonemes.