HMM-Based audio keyword generation

Authors:
Min Xu;Ling-Yu Duan;Jianfei Cai;Liang-Tien Chia;Changsheng Xu;Qi Tian
Affiliations:
School of Computer Engineering, Nanyang Technological University, Singapore;Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore;Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore;Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore
Venue:
PCM'04 Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part III
Year:
2004

Citing 11
Cited 4

Fundamentals of speech recognition

Fundamentals of speech recognition
Automatically extracting highlights for TV Baseball programs

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Automatic detection of 'Goal' segments in basketball videos

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
An integrated baseball digest system using maximum entropy method

Proceedings of the tenth ACM international conference on Multimedia
Automatic Parsing of TV Soccer Programs

ICMCS '95 Proceedings of the International Conference on Multimedia Computing and Systems
A mid-level representation framework for semantic sports video analysis

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Structure analysis of soccer video with domain knowledge and hidden Markov models

Pattern Recognition Letters - Video computing
Creating audio keywords for event detection in soccer video

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Detection of slow-motion replay segments in sports video for highlights generation

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 03
Rapid estimation of camera motion from compressed video with application to video annotation

IEEE Transactions on Circuits and Systems for Video Technology

Efficient sampling of training set in large and noisy multimedia data

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Temporal derivative-based spectrum and mel-cepstrum audio steganalysis

IEEE Transactions on Information Forensics and Security
A cry-based babies identification system

ICISP'10 Proceedings of the 4th international conference on Image and signal processing
An affective interactive audio interface for Lovotics

Computers in Entertainment (CIE) - Theoretical and Practical Computer Applications in Entertainment

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the exponential growth in the production creation of multimedia data, there is an increasing need for video semantic analysis. Audio, as a significant part of video, provides important cues to human perception when humans are browsing and understanding video contents. To detect semantic content by useful audio information, we introduce audio keywords which are sets of specific audio sounds related to semantic events. In our previous work, we designed a hierarchical Support Vector Machine (SVM) classifier for audio keyword identification. However, a weakness of our previous work is that audio signals are artificially segmented into 20 ms frames for frame-based SVM identification without any contextual information. In this paper, we propose a classification method based on Hidden Markov Modal (HMM) for audio keyword identification as an improved work instead of using hierarchical SVM classifier. Choosing HMM is motivated by the successful story of HMM in speech recognition. Unlike the frame-based SVM classification followed by major voting, our proposed HMM-based classifiers treat specific sound as a continuous time series data and employ hidden states transition to capture context information. In particular, we study how to find an effective HMM, i.e., determining topology, observation vectors and statistical parameters of HMM. We also compare different HMM structures with different hidden states, and adjust time series data with variable length. Experimental data includes 40 minutes basketball au-dio which comes from real-time sports games. Experimental results show that, for audio keyword generation, the proposed HMM-based method outperforms the previous hierarchical SVM.