Finite-state multimodal integration and understanding

Authors:
Michael Johnston;Srinivas Bangalore
Affiliations:
AT&T Labs –– Research, 180 Park Ave, Florham Park, NJ 07932, USA e-mail: johnston@research.att.com;AT&T Labs –– Research, 180 Park Ave, Florham Park, NJ 07932, USA e-mail: srini@research.att.com
Venue:
Natural Language Engineering
Year:
2005

Citing 29
Cited 21

A specification language for direct-manipulation user interfaces

ACM Transactions on Graphics (TOG) - Special issue on user interface software
Prolog and natural-language analysis

Prolog and natural-language analysis
Intelligent multi-media interface technology

Intelligent user interfaces
The logic of typed feature structures

The logic of typed feature structures
A design space for multimodal systems: concurrent processing and data fusion

CHI '93 Proceedings of the INTERACT '93 and CHI '93 Conference on Human Factors in Computing Systems
Regular models of phonological rule systems

Computational Linguistics - Special issue on computational phonology
A multimodal dialogue controller for multimodal user interface management system application: a multimodal window manager

CHI '93 INTERACT '93 and CHI '93 Conference Companion on Human Factors in Computing Systems
Multimodal interfaces: new solutions to the problem of computer accessibilty for the blind

CHI '94 Conference Companion on Human Factors in Computing Systems
Complexity of lexical descriptions and its relevance to partial parsing

Complexity of lexical descriptions and its relevance to partial parsing
Multimodal interaction for distributed interactive simulation

Readings in intelligent user interfaces
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Finite state transducers: parsing free and frozen sentences

Extended finite state models of language
FSA Utilities: A Toolbox to Manipulate Finite-State Automata

WIA '96 Revised Papers from the First International Workshop on Implementing Automata
Multi-tape Automata for Speech and Language Systems: A Prolog Implementation

WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
A Rational Design for a Weighted Finite-State Transducer Library

WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
A framework and toolkit for the construction of multimodal learning interfaces

A framework and toolkit for the construction of multimodal learning interfaces
One-level phonology: autosegmental representations and rules as finite automata

Computational Linguistics
Finite-state transducers in language and speech processing

Computational Linguistics
A parser from antiquity

Natural Language Engineering
Nonconcatenative finite-state morphology

EACL '87 Proceedings of the third conference on European chapter of the Association for Computational Linguistics
Unification-based multimodal integration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Finite-state approximation of constraint-based grammars using left-corner grammar transforms

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Deixis and conjunction in multimodal systems

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Finite-state multimodal parsing and understanding

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
MATCH: an architecture for multimodal dialogue systems

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Stochastic finite-state models for spoken language machine translation

NAACL-ANLP-EMTS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Embedded machine translation systems - Volume 5
Multimodal interactive maps: designing for human performance

Human-Computer Interaction

A framework for the intelligent multimodal presentation of information

Signal Processing - Special section: Multimodal human-computer interfaces
The hinge between input and output: understanding the multimodal input fusion results in an agent-based multimodal presentation system

CHI '08 Extended Abstracts on Human Factors in Computing Systems
Ambiguity detection in multimodal systems

AVI '08 Proceedings of the working conference on Advanced visual interfaces
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
References to graphical objects in interactive multimodal queries

Knowledge-Based Systems
Robust gesture processing for multimodal interaction

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
The multimodal presentation dashboard

NAACL-HLT-Dialog '07 Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies
Semi-automatic multimodal user interface generation

Proceedings of the 1st ACM SIGCHI symposium on Engineering interactive computing systems
A multimodal pervasive framework for ambient assisted living

Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Robust understanding in multimodal interfaces

Computational Linguistics
Building multimodal applications with EMMA

Proceedings of the 2009 international conference on Multimodal interfaces
Fusion engines for multimodal input: a survey

Proceedings of the 2009 international conference on Multimodal interfaces
Formal description techniques to support the design, construction and evaluation of fusion engines for sure (safe, usable, reliable and evolvable) multimodal interfaces

Proceedings of the 2009 international conference on Multimodal interfaces
Voice key board: multimodal indic text input

Proceedings of the 2009 international conference on Multimodal interfaces
Generating multimodal grammars for multimodal dialogue processing

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A middleware for implicit interaction

Computing with instinct
A general framework for incremental processing of multimodal inputs

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Modeling multimodal integration with event logic charts

Proceedings of the 14th ACM international conference on Multimodal interaction
Review Article: Multimodal interaction: A review

Pattern Recognition Letters
Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Multimodal retrieval with relevance feedback based on genetic programming

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimodal interfaces are systems that allow input and/or output to be conveyed over multiple channels such as speech, graphics, and gesture. In addition to parsing and understanding separate utterances from different modes such as speech or gesture, multimodal interfaces also need to parse and understand composite multimodal utterances that are distributed over multiple input modes. We present an approach in which multimodal parsing and understanding are achieved using a weighted finite-state device which takes speech and gesture streams as inputs and outputs their joint interpretation. In comparison to previous approaches, this approach is significantly more efficient and provides a more general probabilistic framework for multimodal ambiguity resolution. The approach also enables tight-coupling of multimodal understanding with speech recognition. Since the finite-state approach is more lightweight in computational needs, it can be more readily deployed on a broader range of mobile platforms. We provide speech recognition results that demonstrate compensation effects of exploiting gesture information in a directory assistance and messaging task using a multimodal interface.