Cued Speech automatic recognition in normal-hearing and deaf subjects

Authors:
Panikos Heracleous;Denis Beautemps;Noureddine Aboutabit
Affiliations:
ATR, Intelligent Robotics and Communication Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto-fu 619-0288, Japan and GIPSA-lab, Speech and Cognition Department, CNRS UMR 5216/Stendhal Uni ...;GIPSA-lab, Speech and Cognition Department, CNRS UMR 5216/Stendhal University/UJF/INPG, 961 rue de la Houille Blanche Domaine universitaire BP 46, F-38402 Saint Martin d'Hères cedex, France;GIPSA-lab, Speech and Cognition Department, CNRS UMR 5216/Stendhal University/UJF/INPG, 961 rue de la Houille Blanche Domaine universitaire BP 46, F-38402 Saint Martin d'Hères cedex, France
Venue:
Speech Communication
Year:
2010

Citing 2
Cited 0

Multi-Modal Temporal Asynchronicity Modeling by Product HMMs for Robust

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article discusses the automatic recognition of Cued Speech in French based on hidden Markov models (HMMs). Cued Speech is a visual mode which, by using hand shapes in different positions and in combination with lip patterns of speech, makes all the sounds of a spoken language clearly understandable to deaf people. The aim of Cued Speech is to overcome the problems of lipreading and thus enable deaf children and adults to understand spoken language completely. In the current study, the authors demonstrate that visible gestures are as discriminant as audible orofacial gestures. Phoneme recognition and isolated word recognition experiments have been conducted using data from a normal-hearing cuer. The results obtained were very promising, and the study has been extended by applying the proposed methods to a deaf cuer. The achieved results have not shown any significant differences compared to automatic Cued Speech recognition in a normal-hearing subject. In automatic recognition of Cued Speech, lip shape and gesture recognition are required. Moreover, the integration of the two modalities is of great importance. In this study, lip shape component is fused with hand component to realize Cued Speech recognition. Using concatenative feature fusion and multi-stream HMM decision fusion, vowel recognition, consonant recognition, and isolated word recognition experiments have been conducted. For vowel recognition, an 87.6% vowel accuracy was obtained showing a 61.3% relative improvement compared to the sole use of lip shape parameters. In the case of consonant recognition, a 78.9% accuracy was obtained showing a 56% relative improvement compared to the use of lip shape only. In addition to vowel and consonant recognition, a complete phoneme recognition experiment using concatenated feature vectors and Gaussian mixture model (GMM) discrimination was conducted, obtaining a 74.4% phoneme accuracy. Isolated word recognition experiments in both normal-hearing and deaf subjects were also conducted providing a word accuracy of 94.9% and 89%, respectively. The obtained results were compared with those obtained using audio signal, and comparable accuracies were observed.