Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes

Authors:
Christian Wojek;Stefan Walk;Stefan Roth;Konrad Schindler;Bernt Schiele
Affiliations:
Max Planck Institute for Informatics, Saarbrucken;ETH Zurich, Zurich;Technische Universitat Darmstadt, Darmstadt;ETH Zurich, Zurich;Max Planck Institute for Informatics, Saarbrucken
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2013

Citing 0
Cited 1

Fusion of 3D-LIDAR and camera data for scene parsing

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.14

Visualization

Abstract

Following recent advances in detection, context modeling, and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presents a novel probabilistic 3D scene model that integrates state-of-the-art multiclass object detection, object tracking and scene labeling together with geometric 3D reasoning. Our model is able to represent complex object interactions such as inter-object occlusion, physical exclusion between objects, and geometric context. Inference in this model allows us to jointly recover the 3D scene context and perform 3D multi-object tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. Contrary to many other approaches, our system performs explicit occlusion reasoning and is therefore capable of tracking objects that are partially occluded for extended periods of time, or objects that have never been observed to their full extent. In addition, we show that a joint scene tracklet model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for different types of challenging onboard sequences. We first show a substantial improvement to the state of the art in 3D multipeople tracking. Moreover, a similar performance gain is achieved for multiclass 3D tracking of cars and trucks on a challenging dataset.