There is no data like less data: percepts for video concept detection on consumer-produced media

Authors:
Benjamin Elizalde;Gerald Friedland;Howard Lei;Ajay Divakaran
Affiliations:
International Computer Science Institute, Berkeley, CA, USA;International Computer Science Institute, Berkeley, CA, USA;International Computer Science Institute, Berkeley, CA, USA;Stanford Research Institute, Princeton, NJ, USA
Venue:
Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis
Year:
2012

Citing 7
Cited 2

Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
How many high-level concepts will fill the semantic gap in news video retrieval?

Proceedings of the 6th ACM international conference on Image and video retrieval
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Acoustic super models for large scale video event detection

J-MRE '11 Proceedings of the 2011 joint ACM workshop on Modeling and representing events
Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval

IEEE Transactions on Multimedia
On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks

International Journal of Multimedia Data Engineering & Management

AMVA'12: ACM international workshop on audio and multimedia methods for large-scale video analysis

Proceedings of the 20th ACM international conference on Multimedia
Scalable multimedia content analysis on parallel platforms using python

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Video concept detection aims to find videos that show a certain event described as a high-level concept, e.g. "wedding ceremony" or "changing a tire". This paper presents a theoretical framework and experimental evidence suggesting that video concept detection on consumer-produced videos can be performed by what we call "percepts", which is a set of observable units with Zipfian distribution. We present an unsupervised approach to extract percepts from audio tracks, which we then use to perform experiments to provide evidence for the validity of the proposed theoretical framework using the TRECVID MED 2011 dataset. The approach suggest selecting the most relevant percepts for each concept automatically, thereby actually filtering, selecting and reducing the amount of training data needed. It is show that our framework provides a highly usable foundation for doing video retrieval on consumer-produced content and is applicable for acoustic, visual, as well as multimodal content analysis.