Combining user modeling and machine learning to predict users' multimodal integration patterns

Authors:
Xiao Huang;Sharon Oviatt;Rebecca Lunsford
Affiliations:
Natural Interaction Systems, Portland, OR;Natural Interaction Systems, Portland, OR;Natural Interaction Systems, Portland, OR
Venue:
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Year:
2006

Citing 13
Cited 2

Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
A tutorial on learning with Bayesian networks

Learning in graphical models
Ten myths of multimodal interaction

Communications of the ACM
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Toward a theory of organized multimodal integration patterns during human-computer interaction

Proceedings of the 5th international conference on Multimodal interfaces
Modeling multimodal integration patterns and performance in seniors: toward adaptive processing of individual differences

Proceedings of the 5th international conference on Multimodal interfaces
When do we interact multimodally?: cognitive load and multimodal communication patterns

Proceedings of the 6th international conference on Multimodal interfaces
Layered representations for learning and inferring office activity from multiple sensory channels

Computer Vision and Image Understanding - Special issue on event detection in video
Individual differences in multimodal integration patterns: what are they and why do they exist?

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Markov logic networks

Machine Learning
Multimodal authentication using asynchronous HMMs

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
A practical approach to recognizing physical activities

PERVASIVE'06 Proceedings of the 4th international conference on Pervasive Computing
Toward adaptive information fusion in multimodal systems

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction

Prototyping novel collaborative multimodal systems: simulation, data collection and analysis tools for the next decade

Proceedings of the 8th international conference on Multimodal interfaces
Integrating semantics into multimodal interaction patterns

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Temporal as well as semantic constraints on fusion are at the heart of multimodal system processing. The goal of the present work is to develop user-adaptive temporal thresholds with improved performance characteristics over state-of-the-art fixed ones, which can be accomplished by leveraging both empirical user modeling and machine learning techniques to handle the large individual differences in users' multimodal integration patterns. Using simple Naive Bayes learning methods and a leave-one-out training strategy, our model correctly predicted 88% of users' mixed speech and pen signal input as either unimodal or multimodal, and 91% of their multimodal input as either sequentially or simultaneously integrated. In addition to predicting a user's multimodal pattern in advance of receiving input, predictive accuracies also were evaluated after the first signal's end-point detection—the earliest time when a speech/pen multimodal system makes a decision regarding fusion. This system-centered metric yielded accuracies of 90% and 92%, respectively, for classification of unimodal/multimodal and sequential/simultaneous input patterns. In addition, empirical modeling revealed a .92 correlation between users' multimodal integration pattern and their likelihood of interacting multimodally, which may have accounted for the superior learning obtained with training over heterogeneous user data rather than data partitioned by user subtype. Finally, in large part due to guidance from user-modeling, the techniques reported here required as little as 15 samples to predict a “surprise” user's input patterns.