Nonlinear component analysis as a kernel eigenvalue problem
Neural Computation
Frame Rate and Viseme Analysis for Multimedia Applications toAssist Speechreading
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Visual Speech Synthesis by Morphing Visemes
International Journal of Computer Vision - special issue on learning and vision at the center for biological and computational learning, Massachusetts Institute of Technology
Sentence lipreading using hidden Markov model with integrated grammar
Hidden Markov models
Classifying Visemes for Automatic Lipreading
TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue
Prototyping and Transforming Visemes for Animated Speech
CA '02 Proceedings of the Computer Animation
Wizard-of-Oz test of ARTUR: a computer-based speech training system with articulation correction
Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility
Journal of Cognitive Neuroscience
Audiovisual-to-articulatory inversion
Speech Communication
Persian Viseme Classification for Developing Visual Speech Training Application
PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
A comprehensive audio-visual corpus for teaching sound persian phoneme articulation
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Multimodal information fusion application to human emotion recognition from face and speech
Multimedia Tools and Applications
A novel multimedia educational speech therapy system for hearing impaired children
PCM'10 Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II
Viseme classification for talking head application
CAIP'05 Proceedings of the 11th international conference on Computer Analysis of Images and Patterns
The persian linguistic based audio-visual data corpus, AVA II, considering coarticulation
MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Hi-index | 0.00 |
There are numerous multimedia applications such as talking head, lip reading, lip synchronization, and computer assisted pronunciation training, which entices researchers to bring clustering and analyzing viseme into focus. With respect to the fact that clustering and analyzing visemes are language dependent process, we concentrated our research on Persian language, which indeed has suffered from the lack of such study. To this end, we proposed a novel adopting image-based approach which consists of four main steps including (a) extracting the lip region, (b) obtaining Eigenviseme of each phoneme considering coarticulation effect, (c) mapping each viseme into its subspace and other phonemes' subspaces in order to create the distance matrix so as to calculate the distance between viseme's cluster, and finally (d) comparing similarity of each viseme based on the weight value of reconstructed one. In order to indicate the robustness of the proposed algorithm, three sets of experiments were conducted on Persian and English databases in which Consonant/Vowel and Consonant/Vowel/Consonant syllables were examined. The results indicated that the proposed method outperformed the observed state-of-the-art algorithms in feature extraction, and it had a comparable efficiency in generating adequate clusters. Moreover, obtained results reached a milestone in grouping Persian visemes with respect to the perceptual test given by volunteers.