Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application

Authors:
Edward C. Kaiser
Affiliations:
Center for Human Computer Communication, Beaverton, OR
Venue:
Proceedings of the 10th international conference on Intelligent user interfaces
Year:
2005

Citing 18
Cited 8

The vocabulary problem in human-system communication

Communications of the ACM
Automatic detection and modeling of new words in a large-vocabulary continuous speech recognition system

Automatic detection and modeling of new words in a large-vocabulary continuous speech recognition system
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
The intelligent classroom: providing competent assistance

Proceedings of the fifth international conference on Autonomous agents
Robonaut: A Robot Designed to Work with Humans in Space

Autonomous Robots
Sketching Interfaces: Toward More Human Interface Design

Computer
Using Humanoid Robots to Study Human Behavior

IEEE Intelligent Systems
A Computational Model of Embodied Language Learning

A Computational Model of Embodied Language Learning
Constraining Human Body Tracking

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality

Proceedings of the 5th international conference on Multimodal interfaces
A multimodal learning interface for grounding spoken language in sensory perceptions

Proceedings of the 5th international conference on Multimodal interfaces
Augmenting user interfaces with adaptive speech commands

Proceedings of the 5th international conference on Multimodal interfaces
A study of digital ink in lecture presentation

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Speech, ink, and slides: the interaction of content channels

Proceedings of the 12th annual ACM international conference on Multimedia
A multimodal learning interface for sketch, speak and point creation of a schedule chart

Proceedings of the 6th international conference on Multimodal interfaces
Automatic acquisition of names using speak and spell mode in spoken dialogue systems

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Grounded spoken language acquisition: experiments in word learning

IEEE Transactions on Multimedia

Distributed pointing for multimodal collaboration over sketched diagrams

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Speech pen: predictive handwriting based on ambient multimodal recognition

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Human-centered collaborative interaction

Proceedings of the 1st ACM international workshop on Human-centered multimedia
Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations

Proceedings of the 8th international conference on Multimodal interfaces
Multimodal redundancy across handwriting and speech during computer mediated human-human interactions

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Cross-domain matching for automatic tag extraction across redundant handwriting and speech events

Proceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information
Speech and sketching: an empirical study of multimodal interaction

SBIM '07 Proceedings of the 4th Eurographics workshop on Sketch-based interfaces and modeling
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our goal is to automatically recognize and enroll new vocabulary in a multimodal interface. To accomplish this our technique aims to leverage the mutually disambiguating aspects of co-referenced, co-temporal handwriting and speech. The co-referenced semantics are spatially and temporally determined by our multimodal interface for schedule chart creation. This paper motivates and describes our technique for recognizing out-of-vocabulary (OOV) terms and enrolling them dynamically in the system. We report results for the detection and segmentation of OOV words within a small multimodal test set. On the same test set we also report utterance, word and pronunciation level error rates both over individual input modes and multimodally. We show that combining information from handwriting and speech yields significantly better results than achievable by either mode alone.