A time series clustering based framework for multimedia mining and summarization using audio features

Authors:
Regunathan Radhakrishnan;Ajay Divakaran;Ziyou Xiong
Affiliations:
Mitsubishi Electric Research Laboratory, Cambridge, MA;Mitsubishi Electric Research Laboratory, Cambridge, MA;Mitsubishi Electric Research Laboratory, Cambridge, MA
Venue:
Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
Year:
2004

Citing 9
Cited 5

Fundamentals of speech recognition

Fundamentals of speech recognition
Automatic text recognition for video indexing

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Determining computable scenes in films and their structures using audio-visual memory models

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Video Summaries through Mosaic-Based Shot and Scene Clustering

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Normalized Cuts and Image Segmentation

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Content-based video analysis, indexing and representation using multimodal information

Content-based video analysis, indexing and representation using multimodal information
A statistical framework for fusing mid-level perceptual features in news story segmentation

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Detection of slow-motion replay segments in sports video for highlights generation

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 03

A probabilistic template-based approach to discovering repetitive patterns in broadcast videos

Proceedings of the 13th annual ACM international conference on Multimedia
Unsupervised content discovery in composite audio

Proceedings of the 13th annual ACM international conference on Multimedia
Towards optimal audio "keywords" detection for audio content analysis and discovery

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Video abstraction: A systematic review and classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Online audio background determination for complex audio environments

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Past work on multimedia analysis has shown the utility of detecting specific temporal patterns for different content genres. In this paper, we propose a unified, content-adaptive, unsupervised mining framework to bring out such temporal patterns from different multimedia genres. We formulate the problem of pattern discovery from video as a time series clustering problem. We treat the sequence of low/mid level audio-visual features extracted from the video as a time series and perform a temporal segmentation. The segmentation is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We are thus able to detect transitions and outliers from a sequence of observations from a stationary background process. We define a confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the mining parameters and the confidence measure. Furthermore the confidence measure can be used to rank the detected outliers in terms of their departures from the back-ground process. Our experimental results with sequences of low and mid level audio features extracted from sports video show that "highlight" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out patterns from surveillance videos without any a priori knowledge. Finally, we show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length