Adaptive mouth segmentation using chromatic features
Pattern Recognition Letters
Emotional Chinese talking head system
Proceedings of the 6th international conference on Multimodal interfaces
EURASIP Journal on Applied Signal Processing
Audio-visual speech recognition using MPEG-4 compliant visual features
EURASIP Journal on Applied Signal Processing
Automatic speechreading with applications to human-computer interfaces
EURASIP Journal on Applied Signal Processing
Local spatiotemporal descriptors for visual recognition of spoken phrases
Proceedings of the international workshop on Human-centered multimedia
On parsing visual sequences with the hidden Markov model
Journal on Image and Video Processing
Lipreading with local spatiotemporal descriptors
IEEE Transactions on Multimedia
Eigendecomposition of images correlated on S1, S2, and SO(3) using spectral theory
IEEE Transactions on Image Processing
Intelligent wheelchair multi-modal human-machine interfaces in lip contour extraction based on PMM
ROBIO'09 Proceedings of the 2009 international conference on Robotics and biomimetics
Attractor-Guided particle filtering for lip contour tracking
ACCV'06 Proceedings of the 7th Asian conference on Computer Vision - Volume Part I
Lipreading procedure for liveness verification in video authentication systems
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Lipreading procedure based on dynamic programming
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
Hi-index | 0.01 |
We have designed and implemented a lipreading system that recognizes isolated words using only color video of human lips (without acoustic data). The system performs video recognition using “snakes” to extract visual features of geometric space, Karhunen-Loeve transform (KLT) to extract principal components in the color eigenspace, and hidden Markov models (HMM's) to recognize the combined visual features sequences. With the visual information alone, we were able to achieve 94% accuracy for ten isolated words