A joint system for single-person 2D-face and 3D-head tracking in CHIL seminars

  • Authors:
  • Gerasimos Potamianos;Zhenqiu Zhang

  • Affiliations:
  • IBM T.J. Watson Research Center, Yorktown Heights, NY;Beckman Institute, University of Illinois, Urbana, IL

  • Venue:
  • CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the IBM systems submitted and evaluated within the CLEAR'06 evaluation campaign for the tasks of single person visual 3D tracking (localization) and 2D face tracking on CHIL seminar data. The two systems are significantly inter-connected to justify their presentation within a single paper as a joint vision system for single person 2D-face and 3D-head tracking, suitable for smart room environments with multiple synchronized, calibrated, stationary cameras. Indeed, in the developed system, face detection plays a pivotal role in 3D person tracking, being employed both in system initialization as well as in detecting possible tracking drift. Similarly, 3D person tracking determines the 2D frame regions where a face detector is subsequently applied. The joint system consists of a number of components that employ detection and tracking algorithms, some of which operate on input from all four corner cameras of the CHIL smart rooms, while others select and utilize two out of the four available cameras. Main system highlights constitute the use of AdaBoost-like multi-pose face detectors, a spatio-temporal dynamic programming algorithm to initialize 3D location hypotheses, and an adaptive subspace learning based tracking scheme with a forgetting mechanism as a means to reduce tracking drift. The system is benchmarked on the CLEAR'06 CHIL seminar database, consisting of 26 lecture segments recorded inside the smart rooms of the UKA and ITC CHIL partners. Its resulting 3D single-person tracking performance is 86% accuracy with a precision of 88 mm, whereas the achieved face tracking score is 54% correct with 37% wrong detections and 19% misses. In terms of speed, an inefficient system implementation runs at about 2 fps on a P4 2.8 GHz desktop.