Form: an experiment in the annotation of the kinetics of gesture

  • Authors:
  • Craig Martell;Mitchell P. Marcus

  • Affiliations:
  • University of Pennsylvania;University of Pennsylvania

  • Venue:
  • Form: an experiment in the annotation of the kinetics of gesture
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The most obvious way that humans communicate is through speech, and, as such, there has been a great deal of work in Linguistics; Logic, and Computer Science aimed at understanding, formalizing, and automatically generating and analyzing all aspects of human speech. However, speech is not the only means of communication available to us; we are able to send complex and subtle messages to each other via a variety of other means as well. Gesture is another important channel for conveying intent and meaning. However, unlike the current state of research on speech, gesture research has only very coarse-level categorizations covering the types of gestures and very little in the way of fine-grained techniques for analysis. The current state of the science has gestures divided into essentially four broad categories---beat, iconic, metaphoric, and deictic---and has each gesture decomposable into only four types of constituent phases: preparation, stroke, hold and retraction. We accept this state of the field as our starting point. That is, we accept that there are at least some gestures that are classifiable as described above and that these gestures may be able to be broken down into their constituent phases. However, a coding scheme that only labels the gesture or phase as a whole runs the risk of missing important variations in meaning created by subtle changes in the components of a gesture in question. Slight differences in the make-up of a beat gesture, for example, may well express very different things concerning the mood or intention of the speaker. Accordingly, we have developed a fine-grained, gesture coding scheme---FORM---that allows annotators to exhaustively capture the constituent parts of the gestures of video-recorded speakers. In this thesis we present the FORM annotation scheme, inter-annotator-agreement studies, and the results of some hidden-Markov-model experiments using FORM. Additionally; we compare FORM to more accepted methods of automatic data gathering. The ultimate goal of this project is to develop something like a "phonetics" of gesture that will be useful for both building better HCI systems and doing fundamental scientific research into the communicative process.