Audiovisual integration for tennis broadcast structuring

  • Authors:
  • Ewa Kijak;Guillaume Gravier;Lionel Oisel;Patrick Gros

  • Affiliations:
  • Université de Rennes I, Rennes Cedex, France 35042;CNRS, Rennes Cedex, France 35042;Thomson multimedia R&D, Cesson-Séévigné, France 35510;IRISA, Rennes Cedex, France 35042

  • Venue:
  • Multimedia Tools and Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper focuses on the integration of multimodal features for sport video structure analysis. The method relies on a statistical model which takes into account both the shot content and the interleaving of shots. This stochastic modelling is performed in the global framework of Hidden Markov Models (HMMs) that can be efficiently applied to merge audio and visual cues. Our approach is validated in the particular domain of tennis videos. The model integrates prior information about tennis content and editing rules. The basic temporal unit is the video shot. Visual features are used to characterize the type of shot view. Audio features describe the audio events within a video shot. Two sets of audio features are used in this study: the first one is extracted from a manual segmentation of the soundtrack and is more reliable. The second one is provided by an automatic segmentation and classification process. As a result of the overall HMM process, typical tennis scenes are simultaneously segmented and identified. The experiments illustrate the improvement of HMM-based fusion over indexing using only the best single media, when both media are of similar quality.