Two-granularity tracking: mediating trajectory and detection graphs for tracking under occlusions

  • Authors:
  • Katerina Fragkiadaki;Weiyu Zhang;Geng Zhang;Jianbo Shi

  • Affiliations:
  • Department of Computer and Information Science, University of Pennsylvania;Department of Computer and Information Science, University of Pennsylvania;Institue of Artificial Intelligence and Robotics, Xi'an Jiaotong University, China;Department of Computer and Information Science, University of Pennsylvania

  • Venue:
  • ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a tracking framework that mediates grouping cues from two levels of tracking granularities, detection tracklets and point trajectories, for segmenting objects in crowded scenes. Detection tracklets capture objects when they are mostly visible. They may be sparse in time, may miss partially occluded or deformed objects, or contain false positives. Point trajectories are dense in space and time. Their affinities integrate long range motion and 3D disparity information, useful for segmentation. Affinities may leak though across similarly moving objects, since they lack model knowledge. We establish one trajectory and one detection tracklet graph, encoding grouping affinities in each space and associations across. Two-granularity tracking is cast as simultaneous detection tracklet classification and clustering (cl2) in the joint space of tracklets and trajectories. We solve cl2 by explicitly mediating contradictory affinities in the two graphs: Detection tracklet classification modifies trajectory affinities to reflect object specific dis-associations. Non-accidental grouping alignment between detection tracklets and trajectory clusters boosts or rejects corresponding detection tracklets, changing accordingly their classification.We show our model can track objects through sparse, inaccurate detections and persistent partial occlusions. It adapts to the changing visibility masks of the targets, in contrast to detection based bounding box trackers, by effectively switching between the two granularities according to object occlusions, deformations and background clutter.