Acoustic super models for large scale video event detection

Authors:
Robert Mertens;Howard Lei;Luke Gottlieb;Gerald Friedland;Ajay Divakaran
Affiliations:
International Computer Science Institute, Berkeley, CA, USA;International Computer Science Institute, Berkeley, CA, USA;International Computer Science Institute, Berkeley, CA, USA;International Computer Science Institute, Berkeley, CA, USA;SRI International Sarnoff, Princeton, NJ, USA
Venue:
J-MRE '11 Proceedings of the 2011 joint ACM workshop on Modeling and representing events
Year:
2011

Citing 4
Cited 6

Intelligent Access to Digital Video: Informedia Project

Computer
Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
A non-supervised approach for repeated sequence detection in TV broadcast streams

Image Communication
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval

Modeling and representing events in multimedia

MM '11 Proceedings of the 19th ACM international conference on Multimedia
There is no data like less data: percepts for video concept detection on consumer-produced media

Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis
Name that room: room identification using acoustic features in a recording

Proceedings of the 20th ACM international conference on Multimedia
On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks

International Journal of Multimedia Data Engineering & Management
Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion

International Journal of Multimedia Data Engineering & Management
E-LAMP: integration of innovative ideas for multimedia event detection

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given the exponential growth of videos published on the Internet, mechanisms for clustering, searching, and browsing large numbers of videos have become a major research area. More importantly, there is a demand for event detectors that go beyond the simple finding of objects but rather detect more abstract concepts, such as "feeding an animal" or a "wedding ceremony". This article presents an approach for event classification that enables searching for arbitrary events, including more abstract concepts, in found video collections based on the analysis of the audio track. The approach does not rely on speech processing, and is language-indepent, instead it generates models for a set of example query videos using a mixture of two types of audio features: Linear-Frequency Cepstral Coefficients and Modulation Spectrogram Features. This approach can be used in complement with video analysis and requires no domain specific tagging. Application of the approach to the TRECVid MED 2011 development set, which consists of more than 4000 random "wild" videos from the Internet, has shown a detection accuracy of 64% including those videos which do not contain an audio track.