Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

Authors:
Pui-Yu Hui;Helen Meng
Affiliations:
Human-Comput. Commun. Lab., Chinese Univ. of Hong Kong, Hong Kong, China;Human-Comput. Commun. Lab., Chinese Univ. of Hong Kong, Hong Kong, China
Venue:
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Year:
2014

Citing 27
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Automating the assignment of submitted manuscripts to reviewers

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Personalized information delivery: an analysis of information filtering methods

Communications of the ACM - Special issue on information filtering
Using linear algebra for intelligent information retrieval

SIAM Review
A generic platform for addressing the multimodal challenge

CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Information Retrieval

Information Retrieval
Modeling multimodal integration patterns and performance in seniors: toward adaptive processing of individual differences

Proceedings of the 5th international conference on Multimodal interfaces
A probabilistic approach to reference resolution in multimodal user interfaces

Proceedings of the 9th international conference on Intelligent user interfaces
SmartKom mobile: intelligent ubiquitous user interaction

Proceedings of the 9th international conference on Intelligent user interfaces
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Unification-based multimodal integration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Finite-state multimodal parsing and understanding

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Finite-state multimodal integration and understanding

Natural Language Engineering
SmartKom: Foundations of Multimodal Dialogue Systems (Cognitive Technologies)

SmartKom: Foundations of Multimodal Dialogue Systems (Cognitive Technologies)
Salience modeling based on non-verbal modalities for spoken language understanding

Proceedings of the 8th international conference on Multimodal interfaces
Latent Semantic Mapping: Principles And Applications (Synthesis Lectures on Speech and Audio Processing)

Latent Semantic Mapping: Principles And Applications (Synthesis Lectures on Speech and Audio Processing)
Optimization in multimodal interpretation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
FLSA: extending latent semantic analysis with features for dialogue act classification

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A novel word clustering algorithm based on latent semantic analysis

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Beyond attention: the role of deictic gesture in intention recognition in multimodal conversational interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
Robust understanding in multimodal interfaces

Computational Linguistics
Multimodal inference for driver-vehicle interaction

Proceedings of the 2009 international conference on Multimodal interfaces
Cross-modality semantic integration with hypothesis rescoring for robust interpretation of multimodal user interactions

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Semi-synchronous speech and pen input for mobile user interfaces

Speech Communication
Multimodal integration-a statistical view

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes our work in semantic interpretation of a “multimodal language” with speech and gestures using latent semantic analysis (LSA). Our aim is to infer the domain-specific informational goal of multimodal inputs. The informational goal is characterized by lexical terms used in the spoken modality, partial semantics of gestures in the pen modality, as well as term co-occurrence patterns across modalities, leading to “multimodal terms.” We designed and collected a multimodal corpus of navigational inquiries. We also obtained perfect (i.e. manual) and imperfect (i.e. automatic via recognition) transcriptions for these. We automatically align parsed spoken locative references (SLRs) with their corresponding pen gesture(s) using the Viterbi alignment, according to their numeric and location type features. Then, we characterize each cross-modal integration pattern as a 3-tuple multimodal term with SLR, pen gesture type and their temporal relationship. We propose to use latent semantic analysis (LSA) to derive the latent semantics from manual (i.e. perfect) and automatic (i.e. imperfect) transcriptions of the collected multimodal inputs. In order to achieve this, both multimodal and lexical terms are used to compose an inquiry-term matrix, which is then factorized using singular value decomposition (SVD) to derive the latent semantics automatically. Informational goal inference based on the latent semantics shows that the informational goal inference accuracy of a disjoint test set is 99% and 84% when a perfect and imperfect projection model is used respectively, which performs significantly better than (at least 9.9% absolute) the baseline performance using vector-space model (VSM).