Advances in the robust processing of multimodal speech and pen systems

Authors:
Sharon Oviatt
Affiliations:
Center for Human-Computer Communication, Department of Computer Science & Engineering, Oregon Graduate Institute of Science & Technology, 20000 N. W. Walker Road, Beaverton, Oregon
Venue:
Multimodal interface for human-machine communication
Year:
2002

Citing 18
Cited 4

The logic of typed feature structures

The logic of typed feature structures
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
An open agent architecture

Readings in agents
Audio-visual speech synthesis from French text: eight years of models, designs and evaluation at the ICP

Speech Communication - Special issue on auditory-visual speech processing
Manual and gaze input cascaded (MAGIC) pointing

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Perceptual user interfaces (introduction)

Communications of the ACM
Multimodal system processing in mobile environments

UIST '00 Proceedings of the 13th annual ACM symposium on User interface software and technology
Creating tangible interfaces by augmenting physical objects with multimodal language

Proceedings of the 6th international conference on Intelligent user interfaces
Multimodal Interaction for 2D and 3D Environments

IEEE Computer Graphics and Applications
Guest Editors' Introduction: Biometrics-The Future of Identification

Computer
A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment

ISWC '97 Proceedings of the 1st IEEE International Symposium on Wearable Computers
The Adaptive Agent Architecture: Achieving Fault-Tolerance Using Persistent Broker Teams

ICMAS '00 Proceedings of the Fourth International Conference on MultiAgent Systems (ICMAS-2000)
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Integrating audio and visual information to provide highly robust speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Designing the user interface for multimodal speech and pen-based gesture applications: state-of-the-art systems and future research directions

Human-Computer Interaction
Multimodal integration-a statistical view

IEEE Transactions on Multimedia

Referring to Objects with Spoken and Haptic Modalities

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A mobile multimodal dialogue system for public transportation navigation evaluated

Proceedings of the 8th conference on Human-computer interaction with mobile devices and services
Talking robots with LEGO MindStorms

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Implementing a Multimodal Interface to a DITA User Assistance Repository

AH '08 Proceedings of the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimodal systems have developed rapidly during the past decade, with progress toward building more general and robust systems, as well as more transparent and usable human interfaces. These next-generation multimodal systems aim to improve the expressive power and efficiency of human interfaces, to expand the accessibility of computing for diverse and disabled users, to enhance the performance stability and robustness of recognition-based systems, and to support new forms of computing. In this chapter, we describe the QuickSet multimodal pen/voice system, including its functionality, interface design, natural language processing and fusion techniques, overall architecture, applications and performance. We also summarize results from two recent empirical studies with QuickSet in which its multimodal architecture is shown to decrease failures in spoken language processing by 19-41%. This performance improvement mainly is due to the mutual disambiguation of input signals that is possible within a multimodal architecture, which occurs at higher levels for challenging user groups (accented versus native speakers) and usage environments (mobile versus stationary use). This research demonstrates that new multimodal architectures can stabilize error-prone recognition technologies, and yield major improvements in system robustness.