The vocabulary problem in human-system communication
Communications of the ACM
Automatic detection and modeling of new words in a large-vocabulary continuous speech recognition system
Integration and synchronization of input modes during multimodal human-computer interaction
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Mutual disambiguation of recognition errors in a multimodel architecture
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
The intelligent classroom: providing competent assistance
Proceedings of the fifth international conference on Autonomous agents
Robonaut: A Robot Designed to Work with Humans in Space
Autonomous Robots
Using Humanoid Robots to Study Human Behavior
IEEE Intelligent Systems
A Computational Model of Embodied Language Learning
A Computational Model of Embodied Language Learning
Constraining Human Body Tracking
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality
Proceedings of the 5th international conference on Multimodal interfaces
A multimodal learning interface for grounding spoken language in sensory perceptions
Proceedings of the 5th international conference on Multimodal interfaces
Augmenting user interfaces with adaptive speech commands
Proceedings of the 5th international conference on Multimodal interfaces
A study of digital ink in lecture presentation
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Speech, ink, and slides: the interaction of content channels
Proceedings of the 12th annual ACM international conference on Multimedia
A multimodal learning interface for sketch, speak and point creation of a schedule chart
Proceedings of the 6th international conference on Multimodal interfaces
Automatic acquisition of names using speak and spell mode in spoken dialogue systems
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Grounded spoken language acquisition: experiments in word learning
IEEE Transactions on Multimedia
Distributed pointing for multimodal collaboration over sketched diagrams
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Speech pen: predictive handwriting based on ambient multimodal recognition
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Human-centered collaborative interaction
Proceedings of the 1st ACM international workshop on Human-centered multimedia
Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations
Proceedings of the 8th international conference on Multimodal interfaces
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Cross-domain matching for automatic tag extraction across redundant handwriting and speech events
Proceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information
Speech and sketching: an empirical study of multimodal interaction
SBIM '07 Proceedings of the 4th Eurographics workshop on Sketch-based interfaces and modeling
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
Hi-index | 0.00 |
Our goal is to automatically recognize and enroll new vocabulary in a multimodal interface. To accomplish this our technique aims to leverage the mutually disambiguating aspects of co-referenced, co-temporal handwriting and speech. The co-referenced semantics are spatially and temporally determined by our multimodal interface for schedule chart creation. This paper motivates and describes our technique for recognizing out-of-vocabulary (OOV) terms and enrolling them dynamically in the system. We report results for the detection and segmentation of OOV words within a small multimodal test set. On the same test set we also report utterance, word and pronunciation level error rates both over individual input modes and multimodally. We show that combining information from handwriting and speech yields significantly better results than achievable by either mode alone.