Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions

Authors:
Pui-Yu Hui;Wai-Kit Lo;Helen Meng
Affiliations:
The Chinese University of Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong
Venue:
Proceedings of the 15th international conference on Intelligent user interfaces
Year:
2010

Citing 18
Cited 1

A generic platform for addressing the multimodal challenge

CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
A probabilistic approach to reference resolution in multimodal user interfaces

Proceedings of the 9th international conference on Intelligent user interfaces
Unification-based multimodal integration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Finite-state multimodal parsing and understanding

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Salience modeling based on non-verbal modalities for spoken language understanding

Proceedings of the 8th international conference on Multimodal interfaces
Latent Semantic Mapping: Principles And Applications (Synthesis Lectures on Speech and Audio Processing)

Latent Semantic Mapping: Principles And Applications (Synthesis Lectures on Speech and Audio Processing)
Optimization in multimodal interpretation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Interdependencies among dialog acts, task goals and discourse inheritance in mixed-initiative dialogs

HLT '02 Proceedings of the second international conference on Human Language Technology Research
A Novel Document Clustering Model Based on Latent Semantic Analysis

SKG '07 Proceedings of the Third International Conference on Semantics, Knowledge and Grid
Automatic generic document summarization based on non-negative matrix factorization

Information Processing and Management: an International Journal
Word Topic Models for Spoken Document Retrieval and Transcription

ACM Transactions on Asian Language Information Processing (TALIP)
Cross-modality semantic integration with hypothesis rescoring for robust interpretation of multimodal user interactions

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Multimodal integration-a statistical view

IEEE Transactions on Multimedia

Goal detection from natural language queries

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes our work in usage pattern analysis and development of a latent semantic analysis framework for interpreting multimodal user input consisting speech and pen gestures. We have designed and collected a multimodal corpus of navigational inquiries. Each modality carries semantics related to domain-specific task goal. Each inquiry is annotated manually with a task goal based on the semantics. Multimodal input usually has a simpler syntactic structure than unimodal input and the order of semantic constituents is different in multimodal and unimodal inputs. Therefore, we proposed to use semantic analysis to derive the latent semantics from the multimodal inputs using latent semantic modeling (LSM). In order to achieve this, we parse the recognized Chinese spoken input for the spoken locative references (SLR). These SLRs are then aligned with their corresponding pen gesture(s). Then, we characterized the cross-modal integration pattern as 3-tuple multimodal terms with SLR, pen gesture type and their temporal relation. The inquiry-multimodal term matrix is then decomposed using singular value decomposition (SVD) to derive the latent semantics automatically. Task goal inference based on the latent semantics shows that the task goal inference accuracy on a disjoint test set is of 99%.