Co-inference for multi-modal scene analysis

Authors:
Daniel Munoz;James Andrew Bagnell;Martial Hebert
Affiliations:
The Robotics Institute, Carnegie Mellon University;The Robotics Institute, Carnegie Mellon University;The Robotics Institute, Carnegie Mellon University
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Year:
2012

Citing 11
Cited 2

Invariant surface characteristics for 3D object recognition in range images

Computer Vision, Graphics, and Image Processing - Lectures notes in computer science, Vol. 201 (G. Goos and J. Hartmanis, Eds.)
Original Contribution: Stacked generalization

Neural Networks
Computational Framework for Segmentation and Grouping

Computational Framework for Segmentation and Grouping
Efficient Graph-Based Image Segmentation

International Journal of Computer Vision
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
Multi-Class Segmentation with Relative Location Prior

International Journal of Computer Vision
Segmentation and Recognition Using Structure from Motion Point Clouds

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Semantic segmentation of urban scenes using dense depth maps

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Stacked hierarchical labeling

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Classification and Semantic Mapping of Urban Environments

International Journal of Robotics Research
3D Data Segmentation by Local Classification and Markov Random Fields

3DIMPVT '11 Proceedings of the 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission

Activity forecasting

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Sparse reconstruction for weakly supervised semantic segmentation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of understanding scenes from multiple sources of sensor data (e.g., a camera and a laser scanner) in the case where there is no one-to-one correspondence across modalities (e.g., pixels and 3-D points). This is an important scenario that frequently arises in practice not only when two different types of sensors are used, but also when the sensors are not co-located and have different sampling rates. Previous work has addressed this problem by restricting interpretation to a single representation in one of the domains, with augmented features that attempt to encode the information from the other modalities. Instead, we propose to analyze all modalities simultaneously while propagating information across domains during the inference procedure. In addition to the immediate benefit of generating a complete interpretation in all of the modalities, we demonstrate that this co-inference approach also improves performance over the canonical approach.