Audio keyword generation for sports video analysis

Authors:
Min Xu;Ling-Yu Duan;Liang-Tien Chia;Chang-sheng Xu
Affiliations:
Nanyang Technological University, Singapore;Institute for Infocomm Research;Nanyang Technological University, Singapore;Institute for Infocomm Research
Venue:
Proceedings of the 12th annual ACM international conference on Multimedia
Year:
2004

Citing 4
Cited 3

Fundamentals of speech recognition

Fundamentals of speech recognition
Automatically extracting highlights for TV Baseball programs

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Automatic detection of 'Goal' segments in basketball videos

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Creating audio keywords for event detection in soccer video

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1

Efficient sampling of training set in large and noisy multimedia data

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
An Effective Audio-Visual Information Based Framework for Extracting Highlights in Basketball Games

PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Ball tracking and 3D trajectory approximation with applications to tactics analysis from single-camera volleyball sequences

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic sports video analysis has attracted many research interests and audio cues have been shown to play an important role in semantics inference. To facilitate event detection using audio information, we have introduced the concept of audio keyword (e.g. excited/plain commentator speech, excited/plain audience sound, etc.) to describe the game-specific sound associated with an event. In our previous work, we have designed a hierarchical Support Vector Machine (SVM) classifier for audio keyword identification. However, there are two inherent weaknesses: 1) a frame-based SVM classifier does not incorporate any contextual information; 2) a robust recognizer relies on large amounts of training data in the case of different sports games videos. In this demo, we present a flexible Hidden Markov Model (HMM)-based audio keyword generation system. This is motivated by the successful story of applying HMM in speech recognition. Unlike the frame-based SVM classification followed by a major voting, our HMM-based system treats an audio keyword as a continuous time series data and employs hidden states transition to capture contexts. Moreover, our system introduces an adaptation mechanism to tune the initial HMM models (obtained from available training data) to improve performance by a small number of data from a new sports game video. Promising results has been demonstrated on the tennis, soccer and basketball videos with the total length of 2 hours.