Robust understanding in multimodal interfaces

Authors:
Srinivas Bangalore;Michael Johnston
Affiliations:
-;-
Venue:
Computational Linguistics
Year:
2009

Citing 45
Cited 9

Speech and gestures for graphic image manipulation

CHI '89 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Specifying gestures by example

Proceedings of the 18th annual conference on Computer graphics and interactive techniques
Intelligent multi-media interface technology

Intelligent user interfaces
The logic of typed feature structures

The logic of typed feature structures
The role of natural language in a multimodal interface

UIST '92 Proceedings of the 5th annual ACM symposium on User interface software and technology
Integrating simultaneous input from speech, gaze, and hand gestures

Intelligent multimedia interfaces
Regular models of phonological rule systems

Computational Linguistics - Special issue on computational phonology
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Complexity of lexical descriptions and its relevance to partial parsing

Complexity of lexical descriptions and its relevance to partial parsing
How may I help you?

Speech Communication - Special issue on interactive voice technology for telecommunication applications (IVITA '96)
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multimodal interaction for distributed interactive simulation

Readings in intelligent user interfaces
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Integrated interfaces for decision-support with simulation

WSC '91 Proceedings of the 23rd conference on Winter simulation
Finite state transducers: parsing free and frozen sentences

Extended finite state models of language
Toward conversational human-computer interaction

AI Magazine
Embodied conversational agents: representation and intelligence in user interfaces

AI Magazine
COLLAGEN: A Collaboration Manager for Software Interface Agents

User Modeling and User-Adapted Interaction
Incorporating Prior Knowledge into Boosting

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Brief Introduction to Boosting

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
FSA Utilities: A Toolbox to Manipulate Finite-State Automata

WIA '96 Revised Papers from the First International Workshop on Implementing Automata
A Rational Design for a Weighted Finite-State Transducer Library

WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
A systematic comparison of various statistical alignment models

Computational Linguistics
Stochastic Finite-State Models for Spoken Language MachineTranslation

Machine Translation
Supertagging: an approach to almost parsing

Computational Linguistics
A parser from antiquity

Natural Language Engineering
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Gemini: a natural language system for spoken-language understanding

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Deixis and conjunction in multimodal systems

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Finite-state multimodal parsing and understanding

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
The CommandTalk spoken dialogue system

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Finite-state multimodal integration and understanding

Natural Language Engineering
Creating a finite-state parser with application semantics

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Quantitative and qualitative evaluation of Darpa Communicator spoken dialogue systems

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
MATCH: an architecture for multimodal dialogue systems

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Generalized algorithms for constructing statistical language models

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Stochastic finite-state models for spoken language machine translation

NAACL-ANLP-EMTS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Embedded machine translation systems - Volume 5
Understanding spontaneous speech: the Phoenix system

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
FSA: an efficient and flexible C++ toolkit for finite state automata using on-demand computation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
MATCHKiosk: a multimodal interactive city guide

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Multimodal interactive maps: designing for human performance

Human-Computer Interaction
Acquiring word-meaning mappings for natural language interfaces

Journal of Artificial Intelligence Research
Generalized inference with multiple semantic role labeling systems

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
OpenFst: a general and efficient weighted finite-state transducer library

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
The AT&T spoken language understanding system

IEEE Transactions on Audio, Speech, and Language Processing

Robust gesture processing for multimodal interaction

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Building multimodal applications with EMMA

Proceedings of the 2009 international conference on Multimodal interfaces
Speak4it: multimodal interaction for local search

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Location grounding in multimodal local search

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Multimodal local search in Speak4it

Proceedings of the 16th international conference on Intelligent user interfaces
Multimodal interaction patterns in mobile local search

Proceedings of the 2012 ACM international conference on Intelligent User Interfaces
Multimodal dialogue in mobile local search

Proceedings of the 14th ACM international conference on Multimodal interaction
A multimodal dialogue interface for mobile local search

Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimodal grammars provide an effective mechanism for quickly creating integration and understanding capabilities for interactive systems supporting simultaneous use of multiple input modalities. However, like other approaches based on hand-crafted grammars, multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent input. In this article, we show how the finite-state approach to multimodal language processing can be extended to support multimodal applications combining speech with complex freehand pen input, and evaluate the approach in the context of a multimodal conversational system (MATCH). We explore a range of different techniques for improving the robustness of multimodal integration and understanding. These include techniques for building effective language models for speech recognition when little or no multimodal training data is available, and techniques for robust multimodal understanding that draw on classification, machine translation, and sequence edit methods. We also explore the use of edit-based methods to overcome mismatches between the gesture stream and the speech stream.