CNLS '89 Proceedings of the ninth annual international conference of the Center for Nonlinear Studies on Self-organizing, Collective, and Cooperative Phenomena in Natural and Artificial Computing Networks on Emergent computation
International Journal of Computer Vision
Category learning through multimodality sensing
Neural Computation
Recognition without Correspondence using MultidimensionalReceptive Field Histograms
International Journal of Computer Vision
Mean Shift: A Robust Approach Toward Feature Space Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Contentful mental states for robot baby
Eighteenth national conference on Artificial intelligence
Bayesian networks for speech and image integration
Eighteenth national conference on Artificial intelligence
Statistical Models for Co-occurrence Data
Statistical Models for Co-occurrence Data
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Clustering of the self-organizing map
IEEE Transactions on Neural Networks
Learning to sportscast: a test of grounded language acquisition
Proceedings of the 25th international conference on Machine learning
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Learning Communicative Meanings of Utterances by Robots
New Frontiers in Artificial Intelligence
Learning to connect language and perception
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Reinforcement learning for mapping instructions to actions
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Learning semantic correspondences with less supervision
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Grounding of word meanings in multimodal concepts using LDA
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Reading between the lines: learning to map high-level instructions to commands
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Training a multilingual sportscaster: using perceptual context to learn language
Journal of Artificial Intelligence Research
A game-theoretic approach to generating spatial descriptions
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Learning to win by reading manuals in a Monte-Carlo framework
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning to win by reading manuals in a monte-carlo framework
Journal of Artificial Intelligence Research
Learning high-level planning from text
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
The learning of adjectives and nouns from affordance and appearance features
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Hi-index | 0.00 |
This paper presents a multimodal learning system that can ground spoken names of objects in their physical referents and learn to recognize those objects simultaneously from naturally co-occurring multisensory input. There are two technical problems involved: (1) the correspondence problem in symbol grounding - how to associate words (symbols) with their perceptually grounded meanings from multiple cooccurrences between words and objects in the physical environment. (2) object learning - how to recognize and categorize visual objects. We argue that those two problems can be fundamentally simplified by considering them in a general system and incorporating the spatio-temporal and cross-modal constraints of multimodal data. The system collects egocentric data including image sequences as well as speech while users perform natural tasks. It is able to automatically infer the meanings of object names from vision, and categorize objects based on teaching signals potentially encoded in speech. The experimental results reported in this paper reveal the effectiveness of using multimodal data and integrating heterogeneous techniques in machine learning, natural language processing and computer vision.