Graph-based partial hypothesis fusion for pen-aided speech input

Authors:
Peng Liu;Frank K. Soong
Affiliations:
Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Year:
2009

Citing 16
Cited 1

Fundamentals of speech recognition

Fundamentals of speech recognition
Voice communication with computers: conversational systems

Voice communication with computers: conversational systems
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Ten myths of multimodal interaction

Communications of the ACM
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multimodal error correction for speech user interfaces

ACM Transactions on Computer-Human Interaction (TOCHI)
Spoken dialogue technology: enabling the conversational user interface

ACM Computing Surveys (CSUR)
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Machine Learning for Sequential Data: A Review

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Online Recognition of Chinese Characters: The State-of-the-Art

IEEE Transactions on Pattern Analysis and Machine Intelligence
Word graph based speech rcognition error correction by handwriting input

Proceedings of the 8th international conference on Multimodal interfaces
Multimodal redundancy across handwriting and speech during computer mediated human-human interactions

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Building an application framework for speech and pen input integration in multimodal learning interfaces

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 06
Hidden Markov models based on multi-space probability distribution for pitch pattern modeling

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Multimodal integration: a biological view

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Multimodal integration-a statistical view

IEEE Transactions on Multimedia

An iterative multimodal framework for the transcription of handwritten historical documents

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a specific partial hypothesis fusion problem in sequential data labeling. The problem arises in the multimodal applications where a decision is made by merging complete hypothesis from one input and partial hypothesis from the other. For example, in a pen-aided speech interface, appropriate pen input can provide partial but crucial information. We address the problem in a Bayesian framework, and reformulate the solution as a revised search in a representation. A dynamic programming algorithm is proposed to efficiently solve the partial hypothesis fusion via the graph. It is shown that the computational cost of the graph based partial hypothesis fusion is proportional to the size of the graph, which is highly feasible for a given compact graph. We apply the proposed algorithm to two real applications: an intelligent penbased dictation error correction system and an automatic handwritten character completion with a speech "shortcut". Experimental results show that the algorithm is effective in utilizing the partial information from one modality to enhance the bimodal interface performance.