Unsupervised discovery of multilevel statistical video structures using hierarchical hidden Markov models

Authors:
Lexing Xie;Shih-Fu Chang;A. Divakaran;Huifang Sun
Affiliations:
Dept. of Electr. Eng., Columbia Univ., New York, NY, USA;Dept. of Electr. Eng., Columbia Univ., New York, NY, USA;Ricoh California Res. Center, Menlo Park, CA, USA;IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Venue:
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Year:
2003

Citing 0
Cited 14

The fusion of audio-visual features and external knowledge for event detection in team sports video

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
Multimodal group action clustering in meetings

Proceedings of the ACM 2nd international workshop on Video surveillance & sensor networks
Fusion of AV features and external information sources for event detection in team sports video

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Multimodal estimation of user interruptibility for smart mobile telephones

Proceedings of the 8th international conference on Multimodal interfaces
Media adaptation framework in biofeedback system for stroke patient rehabilitation

Proceedings of the 15th international conference on Multimedia
Motion intention recognition in robot assisted applications

Robotics and Autonomous Systems
A dynamic decision network framework for online media adaptation in stroke rehabilitation

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Hierarchical hidden Markov models with general state hierarchy

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Event detection in sports video based on generative-discriminative models

EiMM '09 Proceedings of the 1st ACM international workshop on Events in multimedia
Sports video segmentation using a hierarchical hidden CRF

ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Unsupervised event segmentation of news content with multimodal cues

Proceedings of the 3rd international workshop on Automated information extraction in media production
Multistream dynamic bayesian network for meeting segmentation

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Narrative structure analysis of lecture video with hierarchical hidden markov model for e-learning

Edutainment'06 Proceedings of the First international conference on Technologies for E-Learning and Digital Entertainment
Features extraction for soccer video semantic analysis: current achievements and remaining issues

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Structure elements in a time sequence (e.g. video) are repetitive segments with consistent deterministic or stochastic characteristics. While most existing work in detecting structures follows a supervised paradigm, we propose a fully unsupervised statistical solution in this paper. We present a unified approach to structure discovery from long video sequences as simultaneously finding the statistical descriptions of structure and locating segments that matches the descriptions. We model the multilevel statistical structure as hierarchical hidden Markov models, and present efficient algorithms for learning both the parameters and the model structure. When tested on a specific domain, soccer video, the unsupervised learning scheme achieves very promising results: it automatically discovers the statistical descriptions of high-level structures, and at the same time achieves even slightly better accuracy in detecting discovered structures in unlabelled videos than a supervised approach designed with domain knowledge and trained with comparable hidden Markov models.