RGB-(D) scene labeling: Features and algorithms

Authors:
Dieter Fox
Affiliations:
Computer Science and Engineering, University of Washington
Venue:
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Year:
2012

Citing 0
Cited 5

3D Wikipedia: using online text to automatically label and navigate reconstructed geometry

ACM Transactions on Graphics (TOG)
Layered moving-object segmentation for stereoscopic video using motion and depth information

Journal of Visual Communication and Image Representation
Guided depth enhancement via a fast marching method

Image and Vision Computing
Structure-based object representation and classification in mobile robotics through a Microsoft Kinect

Robotics and Autonomous Systems
Contextual object category recognition for RGB-D scene labeling

Robotics and Autonomous Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scene labeling research has mostly focused on outdoor scenes, leaving the harder case of indoor scenes poorly understood. Microsoft Kinect dramatically changed the landscape, showing great potentials for RGB-D perception (color+depth). Our main objective is to empirically understand the promises and challenges of scene labeling with RGB-D. We use the NYU Depth Dataset as collected and analyzed by Silberman and Fergus [30]. For RGB-D features, we adapt the framework of kernel descriptors that converts local similarities (kernels) to patch descriptors. For contextual modeling, we combine two lines of approaches, one using a superpixel MRF, and the other using a segmentation tree. We find that (1) kernel descriptors are very effective in capturing appearance (RGB) and shape (D) similarities; (2) both superpixel MRF and segmentation tree are useful in modeling context; and (3) the key to labeling accuracy is the ability to efficiently train and test with large-scale data. We improve labeling accuracy on the NYU Dataset from 56.6% to 76.1%. We also apply our approach to image-only scene labeling and improve the accuracy on the Stanford Background Dataset from 79.4% to 82.9%.