Semantic segmentation of urban scenes using dense depth maps

Authors:
Chenxi Zhang;Liang Wang;Ruigang Yang
Affiliations:
Center for Visualization and Virtual Environments, University of Kentucky;Center for Visualization and Virtual Environments, University of Kentucky;Center for Visualization and Virtual Environments, University of Kentucky
Venue:
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Year:
2010

Citing 13
Cited 9

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Communications of the ACM
Fast Approximate Energy Minimization via Graph Cuts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Space-Sweep Approach to True Multi-Image Matching

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Symmetric Stereo Matching for Occlusion Handling

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Extremely randomized trees

Machine Learning
Efficient Belief Propagation for Early Vision

International Journal of Computer Vision
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
Detailed Real-Time Urban 3D Reconstruction from Video

International Journal of Computer Vision
Semantic object classes in video: A high-definition ground truth database

Pattern Recognition Letters
Segmentation and Recognition Using Structure from Motion Point Clouds

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
TurboPixels: Fast Superpixels Using Geometric Flows

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-resolution real-time stereo on commodity graphics hardware

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

Online semantic mapping of urban environments

SC'12 Proceedings of the 2012 international conference on Spatial Cognition VIII
Indoor segmentation and support inference from RGBD images

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Co-inference for multi-modal scene analysis

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Understanding road scenes using visual cues and GPS information

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Semantic road segmentation via multi-scale ensembles of learned features

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Superparsing

International Journal of Computer Vision
Semantic decomposition and reconstruction of residential scenes from LiDAR data

ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings
Object segmentation and classification using 3-D range camera

Journal of Visual Communication and Image Representation
Fusion of 3D-LIDAR and camera data for scene parsing

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we present a framework for semantic scene parsing and object recognition based on dense depth maps. Five view-independent 3D features that vary with object class are extracted from dense depth maps at a superpixel level for training a classifier using randomized decision forest technique. Our formulation integrates multiple features in a Markov Random Field (MRF) framework to segment and recognize different object classes in query street scene images. We evaluate our method both quantitatively and qualitatively on the challenging Cambridge-driving Labeled Video Database (CamVid). The result shows that only using dense depth information, we can achieve overall better accurate segmentation and recognition than that from sparse 3D features or appearance, or even the combination of sparse 3D features and appearance, advancing state-of-the-art performance. Furthermore, by aligning 3D dense depth based features into a unified coordinate frame, our algorithm can handle the special case of view changes between training and testing scenarios. Preliminary evaluation in cross training and testing shows promising results.