Semantic parsing of street scenes from video

Authors:
Branislav Micusik;Jana Košecká;Gautam Singh
Affiliations:
AIT Austrian Institute of Technology, Vienna, Austria;AIT Austrian Institute of Technology, Vienna, Austria;AIT Austrian Institute of Technology, Vienna, Austria
Venue:
International Journal of Robotics Research
Year:
2012

Citing 25
Cited 0

Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Graph-Based Image Segmentation

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Learning Hierarchical Models of Scenes, Objects, and Parts

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Guiding Model Search Using Segmentation

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Discriminative Object Class Models of Appearance and Shape by Correlatons

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Keypoint Recognition Using Randomized Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Supervised semantic labeling of places using information extracted from sensor data

Robotics and Autonomous Systems
MonoSLAM: Real-Time Single Camera SLAM

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Linear Programming Approach to Max-Sum Problem: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance

International Journal of Robotics Research
Multi-Class Segmentation with Relative Location Prior

International Journal of Computer Vision
Semantic object classes in video: A high-definition ground truth database

Pattern Recognition Letters
Segmentation and Recognition Using Structure from Motion Point Clouds

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Localizing Objects with Smart Dictionaries

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context

International Journal of Computer Vision
A generative framework for fast urban labeling using spatial and temporal context

Autonomous Robots
Efficient texture representation using multi-scale regions

ACCV'07 Proceedings of the 8th Asian conference on Computer vision - Volume Part I
Classification and Semantic Mapping of Urban Environments

International Journal of Robotics Research
Image retrieval with geometry-preserving visual phrases

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic models of the environment can significantly improve navigation and decision making capabilities of autonomous robots or enhance level of human and robot interaction. We present a novel approach for semantic segmentation of street scene images into coherent regions, while simultaneously categorizing each region as one of the predefined categories representing commonly encountered object and background classes. We formulate the segmentation on small blob-based superpixels and exploit a visual vocabulary tree as an intermediate image representation. The main novelty of our approach is the introduction of an explicit model of spatial co-occurrence of visual words associated with superpixels and utilization of appearance, geometry and contextual cues in a probabilistic framework. We demonstrate how individual cues contribute towards global segmentation accuracy and how their combination yields superior performance compared with the best known method on the challenging benchmark dataset which exhibits diversity of street scenes with varying viewpoints, a large number of categories, captured in daylight and dusk.