Audio-based semantic concept classification for consumer video

Authors:
Keansub Lee;Daniel P. W. Ellis
Affiliations:
Laboratory for the Recognition and Organization of Speech and Audio, Electrical Engineering Department, Columbia University, New York, NY;Laboratory for the Recognition and Organization of Speech and Audio, Electrical Engineering Department, Columbia University, New York, NY
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 14
Cited 4

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Content-Based Classification, Search, and Retrieval of Audio

IEEE MultiMedia
Speech/music segmentation using entropy and dynamism features in a HMM classification framework

Speech Communication
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Probability Product Kernels

The Journal of Machine Learning Research
Minimal-impact audio-based personal archives

Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences
PLSA-based image auto-annotation: constraining the latent space

Proceedings of the 12th annual ACM international conference on Multimedia
Acoustic environment classification

ACM Transactions on Speech and Language Processing (TSLP)
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Kodak's consumer video benchmark data set: concept definition and annotation

Proceedings of the international workshop on Workshop on multimedia information retrieval
Large-scale multimodal semantic concept detection for consumer video

Proceedings of the international workshop on Workshop on multimedia information retrieval
Audio-based context recognition

IEEE Transactions on Audio, Speech, and Language Processing
Content-based audio classification and retrieval by support vector machines

IEEE Transactions on Neural Networks

Towards textually describing complex video contents with audio-visual concept classifiers

MM '11 Proceedings of the 19th ACM international conference on Multimedia
SUPER: towards real-time event recognition in internet videos

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Addressing the semantic gap between video sensors and applications

Proceeding of the 23rd ACM Workshop on Network and Operating Systems Support for Digital Audio and Video
Multimedia event detection with multimodal feature fusion and temporal concept localization

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.