On the integration of grounding language and learning objects

Authors:
Chen Yu;Dana H. Ballard
Affiliations:
Department of Computer Science, University of Rochester, Rochester, NY;Department of Computer Science, University of Rochester, Rochester, NY
Venue:
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Year:
2004

Citing 13
Cited 14

The symbol grounding problem

CNLS '89 Proceedings of the ninth annual international conference of the Center for Nonlinear Studies on Self-organizing, Collective, and Cooperative Phenomena in Natural and Artificial Computing Networks on Emergent computation
Color indexing

International Journal of Computer Vision
SEEMORE: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition

Neural Computation
Category learning through multimodality sensing

Neural Computation
Recognition without Correspondence using MultidimensionalReceptive Field Histograms

International Journal of Computer Vision
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Contentful mental states for robot baby

Eighteenth national conference on Artificial intelligence
Bayesian networks for speech and image integration

Eighteenth national conference on Artificial intelligence
Statistical Models for Co-occurrence Data

Statistical Models for Co-occurrence Data
Name-it: naming and detecting faces in video by the integration of image and natural language processing

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Clustering of the self-organizing map

IEEE Transactions on Neural Networks

Learning to sportscast: a test of grounded language acquisition

Proceedings of the 25th international conference on Machine learning
Learning for Semantic Parsing

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Learning Communicative Meanings of Utterances by Robots

New Frontiers in Artificial Intelligence
Learning to connect language and perception

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Reinforcement learning for mapping instructions to actions

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Learning semantic correspondences with less supervision

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Grounding of word meanings in multimodal concepts using LDA

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Reading between the lines: learning to map high-level instructions to commands

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Training a multilingual sportscaster: using perceptual context to learn language

Journal of Artificial Intelligence Research
A game-theoretic approach to generating spatial descriptions

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Learning to win by reading manuals in a Monte-Carlo framework

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning to win by reading manuals in a monte-carlo framework

Journal of Artificial Intelligence Research
Learning high-level planning from text

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
The learning of adjectives and nouns from affordance and appearance features

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a multimodal learning system that can ground spoken names of objects in their physical referents and learn to recognize those objects simultaneously from naturally co-occurring multisensory input. There are two technical problems involved: (1) the correspondence problem in symbol grounding - how to associate words (symbols) with their perceptually grounded meanings from multiple cooccurrences between words and objects in the physical environment. (2) object learning - how to recognize and categorize visual objects. We argue that those two problems can be fundamentally simplified by considering them in a general system and incorporating the spatio-temporal and cross-modal constraints of multimodal data. The system collects egocentric data including image sequences as well as speech while users perform natural tasks. It is able to automatically infer the meanings of object names from vision, and categorize objects based on teaching signals potentially encoded in speech. The experimental results reported in this paper reveal the effectiveness of using multimodal data and integrating heterogeneous techniques in machine learning, natural language processing and computer vision.