Fundamentals of speech recognition
Fundamentals of speech recognition
Automatic segmentation and labeling of speech based on Hidden Markov Models
Speech Communication
Lip-motion analysis for speech segmentation in noise
Speech Communication
Phonetic alignment: speech synthesis-based vs. viterbi-based
Speech Communication
Automatic lipreading to enhance speech recognition (speech reading)
Automatic lipreading to enhance speech recognition (speech reading)
Analysis of lip geometric features for audio-visual speech recognition
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Bimodal automatic speech segmentation based on audio and visual information fusion
Speech Communication
Hi-index | 0.00 |
The use of articulator motion information in automatic speech segmentation is investigated. Automatic speech segmentation is an essential task in speech processing applications like speech synthesis where accuracy and consistency of segmentation are firmly connected to the quality of synthetic speech. The motions of upper and lower lips are incorporated into a hidden Markov model based segmentation process. The MOCHA-TIMIT database, which involves simultaneous articulatograph and microphone recordings, was used to develop and test the models. Different feature vector compositions are proposed for incorporation of articulator motion parameters to the automatic segmentation system. Average absolute boundary error of the system with respect to manual segmentation is decreased by 10.1%. The results are examined in a boundary class dependent manner using both acoustic and visual phone classes, and the performance of the system in different boundary types is discussed. After analyzing the boundary class dependent performance, the error reduction is increased to 18.0% by using the appropriate feature vectors in selected boundaries.