Semantic parsing of street scenes from video

  • Authors:
  • Branislav Micusik;Jana Košecká;Gautam Singh

  • Affiliations:
  • AIT Austrian Institute of Technology, Vienna, Austria;AIT Austrian Institute of Technology, Vienna, Austria;AIT Austrian Institute of Technology, Vienna, Austria

  • Venue:
  • International Journal of Robotics Research
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semantic models of the environment can significantly improve navigation and decision making capabilities of autonomous robots or enhance level of human and robot interaction. We present a novel approach for semantic segmentation of street scene images into coherent regions, while simultaneously categorizing each region as one of the predefined categories representing commonly encountered object and background classes. We formulate the segmentation on small blob-based superpixels and exploit a visual vocabulary tree as an intermediate image representation. The main novelty of our approach is the introduction of an explicit model of spatial co-occurrence of visual words associated with superpixels and utilization of appearance, geometry and contextual cues in a probabilistic framework. We demonstrate how individual cues contribute towards global segmentation accuracy and how their combination yields superior performance compared with the best known method on the challenging benchmark dataset which exhibits diversity of street scenes with varying viewpoints, a large number of categories, captured in daylight and dusk.