A content-adaptive analysis and representation framework for audio event discovery from "unscripted" multimedia

Authors:
Regunathan Radhakrishnan;Ajay Divakaran;Ziyou Xiong;Isao Otsuka
Affiliations:
Mitsubishi Electric Research Laboratory, Cambridge, MA;Mitsubishi Electric Research Laboratory, Cambridge, MA;Mitsubishi Electric Research Laboratory, Cambridge, MA;Advanced Technology R&D Center, Mitsubishi Electric Corporation, Hyogo, Kyoto, Japan
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2006

Citing 12
Cited 5

Fundamentals of speech recognition

Fundamentals of speech recognition
Automatic text recognition for video indexing

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Determining computable scenes in films and their structures using audio-visual memory models

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
A Factorization Approach to Grouping

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume I - Volume I
Video Summaries through Mosaic-Based Shot and Scene Clustering

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Normalized Cuts and Image Segmentation

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Extracting Actors, Actions and Events from Sports Video - A Fundamental Approach to Story Tracking

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Content-based video analysis, indexing and representation using multimodal information

Content-based video analysis, indexing and representation using multimodal information
Creating audio keywords for event detection in soccer video

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
A statistical framework for fusing mid-level perceptual features in news story segmentation

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Detection of slow-motion replay segments in sports video for highlights generation

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 03

Graph-based multilevel temporal segmentation of scripted content videos

GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition
Video scene detection using graph-based representations

Image Communication
Dominant sets based movie scene detection

Signal Processing
Sound recognition in mixtures

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
A comprehensive review of significant researches on content based indexing and retrieval of visual information

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a content-adaptive analysis and representation framework to discover events using audio features from "unscripted" multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier-based temporal segmentation of the content. It is motivated by the observation that "interesting" events in unscripted multimedia occur sparsely in a background of usual or "uninteresting" events. We treat the sequence of low/mid-level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low- and mid-level audio features extracted from sports video show that "highlight" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select "key audio classes" that are indicative of events of interest in the chosen domain.