Using geometric properties of topographic manifold to detect and track eyes for human-computer interaction

  • Authors:
  • Jun Wang;Lijun Yin;Jason Moore

  • Affiliations:
  • Columbia University, New York, NY;State University of New York at Binghamton, Binghamton, NY;Air Force Research Laboratory at Rome, Rome, NY

  • Venue:
  • ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic eye detection and tracking is an important component for advanced human-computer interface design. Accurate eye localization can help develop a successful system for face recognition and emotion identification. In this article, we propose a novel approach to detect and track eyes using geometric surface features on topographic manifold of eye images. First, in the joint spatial-intensity domain, a facial image is treated as a 3D terrain surface or image topographic manifold. In particular, eye regions exhibit certain intrinsic geometric traits on this topographic manifold, namely, the pit-labeled center and hillside-like surround regions. Applying a terrain classification procedure on the topographic manifold of facial images, each location of the manifold can be labeled to generate a terrain map. We use the distribution of terrain labels to represent the eye terrain pattern. The Bhattacharyya affinity is employed to measure the distribution similarity between two topographic manifolds. Based on the Bhattacharyya kernel, a support vector machine is applied for selecting proper eye pairs from the pit-labeled candidates. Second, given detected eyes on the first frame of a video sequence, a mutual-information-based fitting function is defined to describe the similarity between two terrain surfaces of neighboring frames. By optimizing the fitting function, eye locations are updated for subsequent frames. The distinction of the proposed approach lies in that both eye detection and eye tracking are performed on the derived topographic manifold, rather than on an original-intensity image domain. The robustness of the approach is demonstrated under various imaging conditions and with different facial appearances, using both static images and video sequences without background constraints.