Multimodal human discourse: gesture and speech

  • Authors:
  • Francis Quek;David McNeill;Robert Bryll;Susan Duncan;Xin-Feng Ma;Cemil Kirbas;Karl E. McCullough;Rashid Ansari

  • Affiliations:
  • Wright State University, Dayton, OH;University of Chicago;Wright State University, Dayton, OH;Wright State University, University of Chicago;University of Illinois at Chicago;Wright State University;University of Chicago;University of Illinois at Chicago

  • Venue:
  • ACM Transactions on Computer-Human Interaction (TOCHI)
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gesture and speech combine to form a rich basis for human conversational interaction. To exploit these modalities in HCI, we need to understand the interplay between them and the way in which they support communication. We propose a framework for the gesture research done to date, and present our work on the cross-modal cues for discourse segmentation in free-form gesticulation accompanying speech in natural conversation as a new paradigm for such multimodal interaction. The basis for this integration is the psycholinguistic concept of the coequal generation of gesture and speech from the same semantic intent. We present a detailed case study of a gesture and speech elicitation experiment in which a subject describes her living space to an interlocutor. We perform two independent sets of analyses on the video and audio data: video and audio analysis to extract segmentation cues, and expert transcription of the speech and gesture data by microanalyzing the videotape using a frame-accurate videoplayer to correlate the speech with the gestural entities. We compare the results of both analyses to identify the cues accessible in the gestural and audio data that correlate well with the expert psycholinguistic analysis. We show that "handedness" and the kind of symmetry in two-handed gestures provide effective supersegmental discourse cues.