The dicta-sign wiki: enabling web communication for the deaf
ICCHP'12 Proceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part II
Robust and accurate shape model fitting using random forest regression voting
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VII
Hierarchical On-line Appearance-Based Tracking for 3D head pose, eyebrows, lips, eyelids and irises
Image and Vision Computing
Eye pupil localization with an ensemble of randomized trees
Pattern Recognition
A Comprehensive Survey to Face Hallucination
International Journal of Computer Vision
Hi-index | 0.15 |
This paper proposes a learned data-driven approach for accurate, real-time tracking of facial features using only intensity information. The task of automatic facial feature tracking is nontrivial since the face is a highly deformable object with large textural variations and motion in certain regions. Existing works attempt to address these problems by either limiting themselves to tracking feature points with strong and unique visual cues (e.g., mouth and eye corners) or by incorporating a priori information that needs to be manually designed (e.g., selecting points for a shape model). The framework proposed here largely avoids the need for such restrictions by automatically identifying the optimal visual support required for tracking a single facial feature point. This automatic identification of the visual context required for tracking allows the proposed method to potentially track any point on the face. Tracking is achieved via linear predictors which provide a fast and effective method for mapping pixel intensities into tracked feature position displacements. Building upon the simplicity and strengths of linear predictors, a more robust biased linear predictor is introduced. Multiple linear predictors are then grouped into a rigid flock to further increase robustness. To improve tracking accuracy, a novel probabilistic selection method is used to identify relevant visual areas for tracking a feature point. These selected flocks are then combined into a hierarchical multiresolution LP model. Finally, we also exploit a simple shape constraint for correcting the occasional tracking failure of a minority of feature points. Experimental results show that this method performs more robustly and accurately than AAMs, with minimal training examples on example sequences that range from SD quality to Youtube quality. Additionally, an analysis of the visual support consistency across different subjects is also provided.