Scene-level analysis for tennis sports video using weighted linear combination of visual cues

Authors:
Jungong Han;Weilun Lao;Peter H. N. de With
Affiliations:
University of Technology Eindhoven, Eindhoven, The Netherlands;University of Technology Eindhoven, Eindhoven, The Netherlands;University of Technology Eindhoven, Eindhoven, The Netherlands and LogicaCMG, RTSE, Eindhoven, The Netherlands
Venue:
IMSA'06 Proceedings of the 24th IASTED international conference on Internet and multimedia systems and applications
Year:
2006

Citing 3
Cited 0

Automatic Classification of Tennis Video for High-level Content-based Retrieval

CAIVD '98 Proceedings of the 1998 International Workshop on Content-Based Access of Image and Video Databases (CAIVD '98)
Real-time goal-mouth detection in MPEG soccer video

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Automatic soccer video analysis and summarization

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a scene-level analysis system that takes TV broadcasted tennis footage as input and produces a behavior analysis of the moving-players in the scene. To achieve this functionality, our algorithm relies on two modular blocks. The first one detects and tracks a number of key objects in the image domain, like moving-players and the playing-field. Afterwards, a camera-calibration algorithm is applied that uses the lines of the court as a reference and transforms image coordinates to physical positions to compute the camera parameters. The second block firstly models several important events of a tennis game, such as service and net-approach, based on four real-world visual features provided by the first block. This paper proposes a new improved model, since it weights the importance of each visual cue to different events, rather than providing a simple combination of these four visual cues as was done in previous work. Based on the new model, we can accurately determine what kind of event the current input frame belongs to. Furthermore, we detect the start time and end time of each event using a simple but efficient temporal filter. Our proposed system is capable of classifying each tennis play into three semantic categories, which are popular and familiar to most viewers. This paper presents details of the system, together with results on a number of tennis video sequences.