Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
The challenge problem for automated detection of 101 semantic concepts in multimedia
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Towards optimal bag-of-features for object categorization and semantic video retrieval
Proceedings of the 6th ACM international conference on Image and video retrieval
How many high-level concepts will fill the semantic gap in news video retrieval?
Proceedings of the 6th ACM international conference on Image and video retrieval
Short-term audio-visual atoms for generic video concept classification
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Semantic concept annotation based on audio PLSA model
MM '09 Proceedings of the 17th ACM international conference on Multimedia
The SHOGUN Machine Learning Toolbox
The Journal of Machine Learning Research
Vlfeat: an open and portable library of computer vision algorithms
Proceedings of the international conference on Multimedia
High-Level Feature Extraction Using SIFT GMMs and Audio Models
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval
IEEE Transactions on Multimedia
Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study
IEEE Transactions on Multimedia
Hi-index | 0.00 |
State-of-the-art systems for video concept detection mainly rely on visual features. Some previous approaches have also included audio features, either using low-level features such as mel-frequency cepstral coefficients (MFCC) or exploiting the detection of specific audio concepts. In this paper, we investigate a bag of auditory words (BoAW) approach that models MFCC features in an auditory vocabulary. The resulting BoAW features are combined with state-of-the-art visual features via multiple kernel learning (MKL). Experiments on a large set of 101 video concepts from the MediaMill Challenge show the effectiveness of using BoAW features: The system using BoAW features and a support vector machine with a χ 2-kernel is superior to a state-of-the-art audio approach relying on probabilistic latent semantic indexing. Furthermore, it is shown that an early fusion approach degrades detection performance, whereas the combination of auditory and visual bag of words features via MKL yields a relative performance improvement of 9%.