Exploiting speech-gesture correlation in multimodal interaction

Authors:
Fang Chen;Eric H. C. Choi;Ning Wang
Affiliations:
ATP Research Laboratory, National ICT Australia, Sydney, Australia and School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, Australia;ATP Research Laboratory, National ICT Australia, Sydney, Australia;School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, Australia
Venue:
HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Year:
2007

Citing 2
Cited 0

Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Exploiting prosodic structuring of coverbal gesticulation

Proceedings of the 6th international conference on Multimodal interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a study about deriving a set of quantitative relationships between speech and co-verbal gestures for improving multimodal input fusion. The initial phase of this study explores the prosodic features of two human communication modalities, speech and gestures, and investigates the nature of their temporal relationships. We have studied a corpus of natural monologues with respect to frequent deictic hand gesture strokes, and their concurrent speech prosody. The prosodic features from the speech signal have been co-analyzed with the visual signal to learn the correlation of the prominent spoken semantic units with the corresponding deictic gesture strokes. Subsequently, the extracted relationships can be used for disambiguating hand movements, correcting speech recognition errors, and improving input fusion for multimodal user interactions with computers.