Clustering Persian viseme using phoneme subspace for developing visual speech application

Authors:
Mohammad Aghaahmadi;Mohammad Mahdi Dehshibi;Azam Bastanfard;Mahmood Fazlali
Affiliations:
Department of Electrical, Computer and Biomedical Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran;Department of IT, Faculty of Computer and IT, Islamic Azad University-Parand Branch, Parand, Iran;Computer Engineering Faculty, Islamic Azad University of Karaj, Karaj, Iran;Department of Computer Science, Shahid Beheshti University, G.C, Tehran, Iran
Venue:
Multimedia Tools and Applications
Year:
2013

Citing 15
Cited 0

Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Frame Rate and Viseme Analysis for Multimedia Applications toAssist Speechreading

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Visual Speech Synthesis by Morphing Visemes

International Journal of Computer Vision - special issue on learning and vision at the center for biological and computational learning, Massachusetts Institute of Technology
Sentence lipreading using hidden Markov model with integrated grammar

Hidden Markov models
Classifying Visemes for Automatic Lipreading

TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue
Prototyping and Transforming Visemes for Animated Speech

CA '02 Proceedings of the Computer Animation
Wizard-of-Oz test of ARTUR: a computer-based speech training system with articulation correction

Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Audiovisual-to-articulatory inversion

Speech Communication
Persian Viseme Classification for Developing Visual Speech Training Application

PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
A comprehensive audio-visual corpus for teaching sound persian phoneme articulation

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Multimodal information fusion application to human emotion recognition from face and speech

Multimedia Tools and Applications
A novel multimedia educational speech therapy system for hearing impaired children

PCM'10 Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II
Viseme classification for talking head application

CAIP'05 Proceedings of the 11th international conference on Computer Analysis of Images and Patterns
The persian linguistic based audio-visual data corpus, AVA II, considering coarticulation

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are numerous multimedia applications such as talking head, lip reading, lip synchronization, and computer assisted pronunciation training, which entices researchers to bring clustering and analyzing viseme into focus. With respect to the fact that clustering and analyzing visemes are language dependent process, we concentrated our research on Persian language, which indeed has suffered from the lack of such study. To this end, we proposed a novel adopting image-based approach which consists of four main steps including (a) extracting the lip region, (b) obtaining Eigenviseme of each phoneme considering coarticulation effect, (c) mapping each viseme into its subspace and other phonemes' subspaces in order to create the distance matrix so as to calculate the distance between viseme's cluster, and finally (d) comparing similarity of each viseme based on the weight value of reconstructed one. In order to indicate the robustness of the proposed algorithm, three sets of experiments were conducted on Persian and English databases in which Consonant/Vowel and Consonant/Vowel/Consonant syllables were examined. The results indicated that the proposed method outperformed the observed state-of-the-art algorithms in feature extraction, and it had a comparable efficiency in generating adequate clusters. Moreover, obtained results reached a milestone in grouping Persian visemes with respect to the perceptual test given by volunteers.