Fundamentals of speech recognition
Fundamentals of speech recognition
Automatically extracting highlights for TV Baseball programs
MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Automatic detection of 'Goal' segments in basketball videos
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Creating audio keywords for event detection in soccer video
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Efficient sampling of training set in large and noisy multimedia data
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
An Effective Audio-Visual Information Based Framework for Extracting Highlights in Basketball Games
PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Multimedia Tools and Applications
Hi-index | 0.00 |
Semantic sports video analysis has attracted many research interests and audio cues have been shown to play an important role in semantics inference. To facilitate event detection using audio information, we have introduced the concept of audio keyword (e.g. excited/plain commentator speech, excited/plain audience sound, etc.) to describe the game-specific sound associated with an event. In our previous work, we have designed a hierarchical Support Vector Machine (SVM) classifier for audio keyword identification. However, there are two inherent weaknesses: 1) a frame-based SVM classifier does not incorporate any contextual information; 2) a robust recognizer relies on large amounts of training data in the case of different sports games videos. In this demo, we present a flexible Hidden Markov Model (HMM)-based audio keyword generation system. This is motivated by the successful story of applying HMM in speech recognition. Unlike the frame-based SVM classification followed by a major voting, our HMM-based system treats an audio keyword as a continuous time series data and employs hidden states transition to capture contexts. Moreover, our system introduces an adaptation mechanism to tune the initial HMM models (obtained from available training data) to improve performance by a small number of data from a new sports game video. Promising results has been demonstrated on the tennis, soccer and basketball videos with the total length of 2 hours.