Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Real-Time Face Detection
International Journal of Computer Vision
Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images
Journal of VLSI Signal Processing Systems
Visual Speech Recognition with Loosely Synchronized Feature Streams
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Open or Closed Mouth State Detection: Static Supervised Classification Based on Log-Polar Signature
ACIVS '08 Proceedings of the 10th International Conference on Advanced Concepts for Intelligent Vision Systems
Visual lip activity detection and speaker detection using mouth region intensities
IEEE Transactions on Circuits and Systems for Video Technology
Modeling visual perception for image processing
IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Visual speech recognition using motion features and hidden Markov models
CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Hi-index | 0.01 |
Visual voice activity detection (V-VAD) plays an important role in both HCI and HRI, affecting both the conversation strategy and sync between humans and robots/computers. The typical speakingness decision of V-VAD consists of post-processing for signal smoothing and classification using thresholding. Several parameters, ensuring a good trade-off between hit rate and false alarm, are usually heuristically defined. This makes the V-VAD approaches vulnerable to noisy observation and changes of environment conditions, resulting in poor performance and robustness to undesired frequent speaking state changes. To overcome those difficulties, this paper proposes a new probabilistic approach, naming bi-level HMM and analyzing lip activity energy for V-VAD in HRI. The designing idea is based on lip movement and speaking assumptions, embracing two essential procedures into a single model. A bi-level HMM is an HMM with two state variables in different levels, where state occurrence in a lower level conditionally depends on the state in an upper level. The approach works online with low-resolution image and in various lighting conditions, and has been successfully tested in 21 image sequences (22,927 frames). It achieved over 90% of probabilities of detection, in which it brought improvements of almost 20% compared to four other V-VAD approaches.