Tutor-based learning of visual categories using different levels of supervision

Authors:
Mario Fritz;Geert-Jan M. Kruijff;Bernt Schiele
Affiliations:
EECS Department, UC Berkeley & ICSI, Berkeley, USA;Language Technology Lab, DFKI GmbH, Saarbrücken, Germany;CS Department, TU-Darmstadt & MPI Informatik, Saarbrüücken, Germany
Venue:
Computer Vision and Image Understanding
Year:
2010

Citing 24
Cited 0

The symbol grounding problem

CNLS '89 Proceedings of the ninth annual international conference of the Center for Nonlinear Studies on Self-organizing, Collective, and Cooperative Phenomena in Natural and Artificial Computing Networks on Emergent computation
Geometric Hashing: An Overview

IEEE Computational Science & Engineering
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Multi-modal combinatory categorial grammar

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Cross-Generalization: Learning Novel Classes from a Single Example by Feature Replacement

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Coupling CCG and hybrid logic dependency semantics

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Integrating Representative and Discriminative Models for Object Category Detection

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Local Features for Object Class Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Structural descriptions in human-assisted robot visual learning

Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
Unsupervised Learning of Categories from Sets of Partially Matching Image Features

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Socially guided machine learning

Socially guided machine learning
Proximity in context: an empirically grounded computational model of proximity for processing topological spatial expressions

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Robust Object Detection with Interleaved Categorization and Segmentation

International Journal of Computer Vision
Crossmodal content binding in information-processing architectures

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
Planning as an architectural control mechanism

Proceedings of the 4th ACM/IEEE international conference on Human robot interaction
Salience-driven Contextual Priming of Speech Recognition for Human-Robot Interaction

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
An integrated approach to robust processing of situated spoken dialogue

SRSL '09 Proceedings of the 2nd Workshop on Semantic Representation of Spoken Language
Towards unsupervised discovery of visual categories

DAGM'06 Proceedings of the 28th conference on Pattern Recognition
Hyperfeatures – multilevel local coding for visual recognition

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Rapid online learning of objects in a biologically motivated recognition architecture

PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition
Information fusion for visual reference resolution in dynamic situated dialogue

PIT'06 Proceedings of the 2006 international tutorial and research conference on Perception and Interactive Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years we have seen lots of strong work in visual recognition, dialogue interpretation and multi-modal learning that is targeted at provide the building blocks to enable intelligent robots to interact with humans in a meaningful way and even continuously evolve during this process. Building systems that unify those components under a common architecture has turned out to be challenging, as each approach comes with it's own set of assumptions, restrictions, and implications. For example, the impact of recent progress on visual category recognition has been limited from a perspective of interactive systems. Reasons for this are diverse. We identify and address two major challenges in order to integrate modern techniques for visual categorization in an interactive learning system: reducing the number of required labelled training examples and dealing with potentially erroneous input. Today's object categorization methods use either supervised or unsupervised training methods. While supervised methods tend to produce more accurate results, unsupervised methods are highly attractive due to their potential to use far more and unlabeled training data. We proposes a novel method that uses unsupervised training to obtain visual groupings of objects and a cross-modal learning scheme to overcome inherent limitations of purely unsupervised training. The method uses a unified and scale-invariant object representation that allows to handle labeled as well as unlabeled information in a coherent way. First experiments demonstrate the ability of the system to learn object category models from many unlabeled observations and a few dialogue interactions that can be ambiguous or even erroneous.