Gesture spotting for low-resolution sports video annotation

  • Authors:
  • Myung-Cheol Roh;Bill Christmas;Joseph Kittler;Seong-Whan Lee

  • Affiliations:
  • Department of Computer Science and Engineering, Korea University, Anam-dong, Seongbuk-ku, Seoul 136-713, Korea;Center for Vision, Speech, and Signal Processing, University of Surrey, Guildford GU2 7XH, UK;Center for Vision, Speech, and Signal Processing, University of Surrey, Guildford GU2 7XH, UK;Department of Computer Science and Engineering, Korea University, Anam-dong, Seongbuk-ku, Seoul 136-713, Korea

  • Venue:
  • Pattern Recognition
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Human gesture recognition plays an important role in automating the analysis of video material at a high level. Especially in sports videos, the determination of the player's gestures is a key task. In many sports views, the camera covers a large part of the sports arena, resulting in low resolution of the player's region. Moreover, the camera is not static, but moves dynamically around its optical center, i.e. pan/tilt/zoom camera. These factors make the determination of the player's gestures a challenging task. To overcome these problems, we propose a posture descriptor that is robust to shape corruption of the player's silhouette, and a gesture spotting method that is robust to noisy sequences of data and needs only a small amount of training data. The proposed posture descriptor extracts the feature points of a shape, based on the curvature scale space (CSS) method. The use of CSS makes this method robust to local noise, and our method is also robust to significant shape corruption of the player's silhouette. The proposed spotting method provides probabilistic similarity and is robust to noisy sequences of data. It needs only a small number of training data sets, which is a very useful characteristic when it is difficult to obtain enough data for model training. In this paper, we conducted experiments spotting serve gestures using broadcast tennis play video. From our experiments, for 63 shots of playing tennis, some of which include a serve gesture and while some do not, it achieved 97.5% precision rate and 86.7% recall rate.