Review of neural networks for speech recognition
Neural Computation
CNLS '89 Proceedings of the ninth annual international conference of the Center for Nonlinear Studies on Self-organizing, Collective, and Cooperative Phenomena in Natural and Artificial Computing Networks on Emergent computation
International Journal of Computer Vision
Recognition without Correspondence using MultidimensionalReceptive Field Histograms
International Journal of Computer Vision
Clustering Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Understanding Human Behaviors Based on Eye-Head-Hand Coordination
BMCV '02 Proceedings of the Second International Workshop on Biologically Motivated Computer Vision
Invariant features for 3-D gesture recognition
FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Attentional Object Spotting by Integrating Multimodal Input
ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Learning to Recognize Human Action Sequences
ICDL '02 Proceedings of the 2nd International Conference on Development and Learning
When push comes to shove: a computational model of the role of motor control in the acquisition of action verbs
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic
Journal of Artificial Intelligence Research
Image segmentation with ratio cut
IEEE Transactions on Pattern Analysis and Machine Intelligence
ViewPointer: lightweight calibration-free eye tracking for ubiquitous handsfree deixis
Proceedings of the 18th annual ACM symposium on User interface software and technology
Modeling embodied visual behaviors
ACM Transactions on Applied Perception (TAP)
Multimodal human-computer interaction: A survey
Computer Vision and Image Understanding
Robotic vocabulary building using extension inference and implicit contrast
Artificial Intelligence
Language Label Learning for Visual Concepts Discovered from Video Sequences
Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint
Voice enabling mobile financial services with multimodal transformation
International Journal of Mobile Communications
Societal grounding is essential to meaningful language use
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Visual data mining of multimedia data for social and behavioral studies
Information Visualization
Learning to interpret utterances using dialogue history
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
The Use of Multimodal Representation in Icon Interpretation
EPCE '09 Proceedings of the 8th International Conference on Engineering Psychology and Cognitive Ergonomics: Held as Part of HCI International 2009
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A robot that uses existing vocabulary to infer non-visual word meanings from observation
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Affordance based word-to-meaning association
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
The role of interactivity in human-machine conversation for automatic word acquisition
SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Context-based word acquisition for situated dialogue in a virtual world
Journal of Artificial Intelligence Research
Proceedings of the 2010 workshop on Eye gaze in intelligent human machine interaction
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Adaptive eye gaze patterns in interactions with human and artificial agents
ACM Transactions on Interactive Intelligent Systems (TiiS)
Multimodal human computer interaction: a survey
ICCV'05 Proceedings of the 2005 international conference on Computer Vision in Human-Computer Interaction
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
SPRING: speech and pronunciation improvement through games, for Hispanic children
Proceedings of the 4th ACM/IEEE International Conference on Information and Communication Technologies and Development
Integrating word acquisition and referential grounding towards physical world interaction
Proceedings of the 14th ACM international conference on Multimodal interaction
Unsupervised language learning for discovered visual concepts
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part IV
Hi-index | 0.00 |
We present a multimodal interface that learns words from natural interactions with users. In light of studies of human language development, the learning system is trained in an unsupervised mode in which users perform everyday tasks while providing natural language descriptions of their behaviors. The system collects acoustic signals in concert with user-centric multisensory information from nonspeech modalities, such as user's perspective video, gaze positions, head directions, and hand movements. A multimodal learning algorithm uses this data to first spot words from continuous speech and then associate action verbs and object names with their perceptually grounded meanings. The central ideas are to make use of nonspeech contextual information to facilitate word spotting, and utilize body movements as deictic references to associate temporally cooccurring data from different modalities and build hypothesized lexical items. From those items, an EM-based method is developed to select correct word--meaning pairs. Successful learning is demonstrated in the experiments of three natural tasks: "unscrewing a jar," "stapling a letter," and "pouring water."