Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations

Authors:
Edward C. Kaiser
Affiliations:
Natural Interaction Systems, LLC., Seattle, Washington
Venue:
Proceedings of the 8th international conference on Multimodal interfaces
Year:
2006

Citing 21
Cited 9

Introduction to artificial intelligence

Introduction to artificial intelligence
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Speech retrieval using phonemes with error correction

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Subword-based approaches for spoken document retrieval

Speech Communication
A Computational Model of Embodied Language Learning

A Computational Model of Embodied Language Learning
Automatic time alignment of phonemes using acoustic-phonetic information

Automatic time alignment of phonemes using acoustic-phonetic information
A multimodal learning interface for grounding spoken language in sensory perceptions

Proceedings of the 5th international conference on Multimodal interfaces
Augmenting user interfaces with adaptive speech commands

Proceedings of the 5th international conference on Multimodal interfaces
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Unification-based multimodal integration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A study of digital ink in lecture presentation

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Speech, ink, and slides: the interaction of content channels

Proceedings of the 12th annual ACM international conference on Multimedia
A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

Proceedings of the 6th international conference on Multimodal interfaces
A multimodal learning interface for sketch, speak and point creation of a schedule chart

Proceedings of the 6th international conference on Multimodal interfaces
Linguistic theories in efficient multimodal reference resolution: an empirical investigation

Proceedings of the 10th international conference on Intelligent user interfaces
Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application

Proceedings of the 10th international conference on Intelligent user interfaces
Automatic acquisition of names using speak and spell mode in spoken dialogue systems

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Speech pen: predictive handwriting based on ambient multimodal recognition

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The AMI meeting corpus: a pre-announcement

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
From members to teams to committee-a robust approach to gestural and multimodal recognition

IEEE Transactions on Neural Networks

Human-centered collaborative interaction

Proceedings of the 1st ACM international workshop on Human-centered multimedia
Collaborative multimodal photo annotation over digital paper

Proceedings of the 8th international conference on Multimodal interfaces
Multimodal redundancy across handwriting and speech during computer mediated human-human interactions

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Cross-domain matching for automatic tag extraction across redundant handwriting and speech events

Proceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information
Multimodal Chinese text entry with speech and keypad on mobile devices

Proceedings of the 13th international conference on Intelligent user interfaces
Speech and sketching: an empirical study of multimodal interaction

SBIM '07 Proceedings of the 4th Eurographics workshop on Sketch-based interfaces and modeling
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
Tailoring the Interpretation of Spatial Utterances for Playing a Board Game

AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
A dialogue approach to learning object descriptions and semantic categories

Robotics and Autonomous Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

New language constantly emerges from complex, collaborative human-human interactions like meetings -- such as, for instance, when a presenter handwrites a new term on a whiteboard while saying it. Fixed vocabulary recognizers fail on such new terms, which often are critical to dialogue understanding. We present a proof-of-concept multimodal system that combines information from handwriting and speech recognition to learn the spelling, pronunciation and semantics of out-of-vocabulary terms from single instances of redundant multimodal presentation (e.g. saying a term while handwriting it). For the task of recognizing the spelling and semantics of abbreviated Gantt chart labels across a held-out test series of five scheduling meetings we show a significant relative error rate reduction of 37% when our learning methods are used and allowed to persist across the meeting series, as opposed to when they are not used.