Neural Network-Based Face Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Detecting Faces in Images: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence
Name-It: Naming and Detecting Faces in News Videos
IEEE MultiMedia
Training Support Vector Machines: an Application to Face Detection
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition
CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Simultaneous alpha map generation and 2-D mesh tracking for multimedia applications
ICIP '97 Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1
Experience based sampling technique for multimedia analysis
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Journal of Cognitive Neuroscience
Unsupervised video segmentation based on watersheds and temporal tracking
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
We present a robust method to detect and locate a speaker using a joint analysis of speech sound and video image. First, the short speech sound data is analyzed to estimate the rate of spoken syllables, and a difference image is formed using the optimal frame distance derived from the rate to detect the candidates of mouth. Then, they are tracked to positively prove that one of the candidates is the mouth; the rate of mouth movements is estimated from the brightness change profiles for the first candidate and, if both the rates agree, the three brightest parts are detected in the resulting difference image as mouth and eyes. If not, the second candidate is tracked and so on. The first-order moment of the power spectrum of the brightness change profile and the lateral shifts in the tracking are also used to check whether or not they are facial parts.