Real-time lip reading system for isolated Korean word recognition

Authors:
Jongju Shin;Jin Lee;Daijin Kim
Affiliations:
Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-Dong, Nam-Gu, Pohang 790-784, Republic of Korea;Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-Dong, Nam-Gu, Pohang 790-784, Republic of Korea;Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-Dong, Nam-Gu, Pohang 790-784, Republic of Korea
Venue:
Pattern Recognition
Year:
2011

Citing 14
Cited 5

Active shape models—their training and application

Computer Vision and Image Understanding
Extraction of Visual Features for Lipreading

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Incremental and Hierarchical K-NN Classifier for Handwritten Characters

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
The Template Update Problem

IEEE Transactions on Pattern Analysis and Machine Intelligence
Active Appearance Models Revisited

International Journal of Computer Vision
A two-channel training algorithm for hidden Markov model and its application to lip reading

EURASIP Journal on Applied Signal Processing
Visual recognition of speech consonants using facial movement features

Integrated Computer-Aided Engineering - Informatics in Control, Automation and Robotics
Lip-Reading Technique Using Spatio-Temporal Templates and Support Vector Machines

CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
An iterative image registration technique with an application to stereo vision

IJCAI'81 Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2
Design and implementation of a lip reading system in smart phone environment

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Face detection with the modified census transform

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

IEEE Transactions on Information Theory
The condensed nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory

Scalable image quality assessment with 2D mel-cepstrum and machine learning approach

Pattern Recognition
Engkey: tele-education robot

ICSR'11 Proceedings of the Third international conference on Social Robotics
Lip tracking method for the system of audio-visual polish speech recognition

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
GPU accelerated image processing for lip segmentation

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Integration of face detection and user identification with visual speech recognition

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper proposes a real-time lip reading system (consisting of a lip detector, lip tracker, lip activation detector, and word classifier), which can recognize isolated Korean words. Lip detection is performed in several stages: face detection, eye detection, mouth detection, mouth end-point detection, and active appearance model (AAM) fitting. Lip tracking is then undertaken via a novel two-stage lip tracking method, where the model-based Lucas-Kanade feature tracker is used to track the outer lip, and then a fast block matching algorithm is used to track the inner lip. Lip activation detection is undertaken through a neural network classifier, the input for which being a combination of the lip motion energy function and the first dominant shape feature. In the last step, input words are defined and recognized by three different classifiers: HMM, ANN, and K-NN. We combine the proposed lip reading system with an audio-only automatic speech recognition (ASR) system to improve the word recognition performance in the noisy environments. We then demonstrate the potential applicability of the combined system for use within hands free in-vehicle navigation devices. Results from experiments undertaken on 30 isolated Korean words using the K-NN classifier at a speed of 15fps demonstrate that the proposed lip reading system achieves a 92.67% word correct rate (WCR) for person-dependent tests, and a 46.50% WCR for person-independent tests. Also, the combined audio-visual ASR system increases the WCR from 0% to 60% in a noisy environment.