Gesture Cues for Conversational Interaction in Monocular Video

Authors:
Francis Quek;David McNeill;Rashid Ansari;Xin-Feng Ma;Robert Bryll;Susan Duncan;Karl E. McCullough
Affiliations:
-;-;-;-;-;-;-
Venue:
RATFG-RTS '99 Proceedings of the International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems
Year:
1999

Citing 0
Cited 6

A Multimedia System for Temporally Situated Perceptual Psycholinguistic Analysis

Multimedia Tools and Applications
Hand Gesture Symmetric Behavior Detection and Analysis in Natural Conversation

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Hand motion gestural oscillations and multimodal discourse

Proceedings of the 5th international conference on Multimodal interfaces
Multimodal model integration for sentence unit detection

Proceedings of the 6th international conference on Multimodal interfaces
Differential video coding of face and gesture events in presentation videos

Computer Vision and Image Understanding - Special issue on event detection in video
The catchment feature model: a device for multimodal fusion and a bridge between signal and sense

EURASIP Journal on Applied Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present our work on the determination of cues for discourse segmentation in free-form gesticulation accompanying speech in natural conversation. The basis for this integrating between gesticulation and speech discourse is the psycholinguistic concept of the co-equal generation of gesture and speech from the same semantic intent. We use the psycholinguistic device known as the `catchment' as the locus around which this integration proceeds. We videotape gesture and speech elicitation experiments in which a subject describes her living space to an interlocutor. We extract the gestural motion of both hands using the Vector Coherence Mapping algorithm that combines spatial, momentum and skin color constraints in parallel using a fuzzy image processing approach. We extract the voiced units in the discourse as F0 units are correlate these with transcribed speech. Psycholinguistics researchers perceptually micro-analyze the same video tape to produce a transcript that is annotated with the video timestamp and perceived gesture-speech entities. These serve to direct our high level analysis of the gesture trace and F0 data. We report the results of our analysis that show that the feature of `handedness' and the kind of symmetry in two-handed gestures provide effective cues for discourse segmentation. We also present observations on how the gesture traces provide cues to segment hand use, high level discourse repair, and super-segmental cues for discourse grouping.