Some coding properties of speech

Authors:
V. N. Sorokin
Affiliations:
Institute for Information Transmission Problems, Russian Academy of Sciences, Bolshoy Karetny 19, 101447 Moscow, Russia
Venue:
Speech Communication
Year:
2003

Citing 6
Cited 0

Inverse problem for fricatives

Speech Communication
Speech recognition by machines and humans

Speech Communication
Should recognizers have ears?

Speech Communication - Special issue on robust speech recognition
Speaking in shorthand — a syllable-centric perspective for understanding pronunciation variation

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Estimation of stability and accuracy of inverse problem solution for the vocal tract

Speech Communication
Fundamentals of Convolutional Coding

Fundamentals of Convolutional Coding

Quantified Score

Hi-index	0.00

Visualization

Abstract

Some important properties of speech are considered from the point of view of the theory of error correcting codes. It has found experimentally that the properties of Russian words encoded in terms of phonemes are largely similar to the properties of the so-called prefix codes. In the prefix codes, no code word is a prefix of another word. According to the theory of coding, for any prefix code exists an algorithm of unambiguous decoding where no pause or special symbol-delimeter separates code words. Apparently, word segmentation in the continuous speech signal is provided mainly by the use of the prefix property. Phoneme probability in Russian follows the Mandelbrot law. This finding is evidence in favour of the assumption that the probability is determined by the "complexity" or "expense" of phoneme generation. Speech recognition for a large vocabulary requires much time for access to word templates. Thus, a preliminary sorting of the templates is necessary to restrict a number of candidates for final recognition. The preliminary sorting can be executed by means of word coding by few phonemic cues. Auditory experiments with speech masked by white noise have revealed the most reliable cues. These cues are "vowel, voiced, nasal, fricative". About 150 templates were left after the fast sorting procedure for approximately 100,000 templates in the vocabulary of 10,000 of the most frequent English words. Speech recognition rate obtained by an automatic recognition system must be compared with potentially achievable rate. The potential rate of word recognition for various S/N ratios can be computed with the use of methods developed in the theory of coding. It can be argued that an optimal machine for automatic speech recognition should find robust the same cues which humans find robust. The potential rate for words encoded in terms of independent distinctive features is closer to the subjective reliability of word perception than the rate for words encoded by phonemes.