The use of articulator motion information in automatic speech segmentation

Authors:
Eren Akdemir;Tolga Çiloğlu
Affiliations:
Electrical and Electronics Engineering Department, Middle East Technical University, 06531 Ankara, Turkey;Electrical and Electronics Engineering Department, Middle East Technical University, 06531 Ankara, Turkey
Venue:
Speech Communication
Year:
2008

Citing 6
Cited 1

Fundamentals of speech recognition

Fundamentals of speech recognition
Automatic segmentation and labeling of speech based on Hidden Markov Models

Speech Communication
Lip-motion analysis for speech segmentation in noise

Speech Communication
Phonetic alignment: speech synthesis-based vs. viterbi-based

Speech Communication
Automatic lipreading to enhance speech recognition (speech reading)

Automatic lipreading to enhance speech recognition (speech reading)
Analysis of lip geometric features for audio-visual speech recognition

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Bimodal automatic speech segmentation based on audio and visual information fusion

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of articulator motion information in automatic speech segmentation is investigated. Automatic speech segmentation is an essential task in speech processing applications like speech synthesis where accuracy and consistency of segmentation are firmly connected to the quality of synthetic speech. The motions of upper and lower lips are incorporated into a hidden Markov model based segmentation process. The MOCHA-TIMIT database, which involves simultaneous articulatograph and microphone recordings, was used to develop and test the models. Different feature vector compositions are proposed for incorporation of articulator motion parameters to the automatic segmentation system. Average absolute boundary error of the system with respect to manual segmentation is decreased by 10.1%. The results are examined in a boundary class dependent manner using both acoustic and visual phone classes, and the performance of the system in different boundary types is discussed. After analyzing the boundary class dependent performance, the error reduction is increased to 18.0% by using the appropriate feature vectors in selected boundaries.