Extraction of Visual Features for Lipreading
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Audio-to-Visual Conversion Using Hidden Markov Models
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Person identification using automatic integration of speech, lip, and face experts
WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
Proceedings of the 6th international conference on Multimodal interfaces
Visual Speech Recognition with Loosely Synchronized Feature Streams
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Product HMMs for audio-visual continuous speech recognition using facial animation parameters
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
2D Cascaded AdaBoost for Eye Localization
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Face Description with Local Binary Patterns: Application to Face Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Audio-visual speech recognition using MPEG-4 compliant visual features
EURASIP Journal on Applied Signal Processing
Dynamic Bayesian networks for audio-visual speech recognition
EURASIP Journal on Applied Signal Processing
Local spatiotemporal descriptors for visual recognition of spoken phrases
Proceedings of the international workshop on Human-centered multimedia
Boosted multi-resolution spatiotemporal descriptors for facial expression recognition
Pattern Recognition Letters
Learning personal specific facial dynamics for face recognition from videos
AMFG'07 Proceedings of the 3rd international conference on Analysis and modeling of faces and gestures
Boosting local binary pattern (LBP)-Based face recognition
SINOBIOMETRICS'04 Proceedings of the 5th Chinese conference on Advances in Biometric Person Authentication
IEEE Transactions on Image Processing
Combining dynamic texture and structural features for speaker identification
Proceedings of the 2nd ACM workshop on Multimedia in forensics, security and intelligence
Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Expression recognition in videos using a weighted component-based feature descriptor
SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis
Facial expression recognition from near-infrared videos
Image and Vision Computing
Unsupervised temporal segmentation of talking faces using visual cues to improve emotion recognition
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Towards a visual speech learning system for the deaf by matching dynamic lip shapes
ICCHP'12 Proceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part I
Lip peripheral motion for visual surveillance
Proceedings of the Fifth International Conference on Security of Information and Networks
The Visual Computer: International Journal of Computer Graphics
Hi-index | 0.00 |
Visual speech information plays an important role in lipreading under noisy conditions or for listeners with a hearing impairment. In this paper, we present local spatiotemporal descriptors to represent and recognize spoken isolated phrases based solely on visual input. Spatiotemporal local binary patterns extracted from mouth regions are used for describing isolated phrase sequences. In our experiments with 817 sequences from ten phrases and 20 speakers, promising accuracies of 62% and 70% were obtained in speaker-independent and speaker-dependent recognition, respectively. In comparison with other methods on AVLetters database, the accuracy, 62.8%, of our method clearly outperforms the others. Analysis of the confusion matrix for 26 English letters shows the good clustering characteristics of visemes for the proposed descriptors. The advantages of our approach include local processing and robustness to monotonic gray-scale changes. Moreover, no error prone segmentation of moving lips is needed.