Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Content-Based Classification, Search, and Retrieval of Audio
IEEE MultiMedia
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
The Journal of Machine Learning Research
Minimal-impact audio-based personal archives
Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences
PLSA-based image auto-annotation: constraining the latent space
Proceedings of the 12th annual ACM international conference on Multimedia
Acoustic environment classification
ACM Transactions on Speech and Language Processing (TSLP)
Real-time discrimination of broadcast speech/music
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Kodak's consumer video benchmark data set: concept definition and annotation
Proceedings of the international workshop on Workshop on multimedia information retrieval
Large-scale multimodal semantic concept detection for consumer video
Proceedings of the international workshop on Workshop on multimedia information retrieval
Audio-based context recognition
IEEE Transactions on Audio, Speech, and Language Processing
Content-based audio classification and retrieval by support vector machines
IEEE Transactions on Neural Networks
Towards textually describing complex video contents with audio-visual concept classifiers
MM '11 Proceedings of the 19th ACM international conference on Multimedia
SUPER: towards real-time event recognition in internet videos
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Addressing the semantic gap between video sensors and applications
Proceeding of the 23rd ACM Workshop on Network and Operating Systems Support for Digital Audio and Video
Multimedia event detection with multimodal feature fusion and temporal concept localization
Machine Vision and Applications
Hi-index | 0.00 |
This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.