Evaluating multimedia features and fusion for example-based event detection

Authors:
Gregory K. Myers;Ramesh Nallapati;Julien Hout;Stephanie Pancoast;Ramakant Nevatia;Chen Sun;Amirhossein Habibian;Dennis C. Koelma;Koen E. Sande;Arnold W. Smeulders;Cees G. Snoek
Affiliations:
SRI International (SRI), Menlo Park, USA 94025;SRI International (SRI), Menlo Park, USA 94025 and IBM Thomas J Watson Research Center, Yorktown Heights, USA 10598;SRI International (SRI), Menlo Park, USA 94025;SRI International (SRI), Menlo Park, USA 94025;Institute for Robotics and Intelligent Systems, University of Southern California (USC), Los Angeles, USA 90089-0273;Institute for Robotics and Intelligent Systems, University of Southern California (USC), Los Angeles, USA 90089-0273;University of Amsterdam (UvA), Amsterdam, The Netherlands 1098 GH;University of Amsterdam (UvA), Amsterdam, The Netherlands 1098 GH;University of Amsterdam (UvA), Amsterdam, The Netherlands 1098 GH;University of Amsterdam (UvA), Amsterdam, The Netherlands 1098 GH;University of Amsterdam (UvA), Amsterdam, The Netherlands 1098 GH
Venue:
Machine Vision and Applications
Year:
2014

Citing 26
Cited 1

Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Color Invariance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
On Space-Time Interest Points

International Journal of Computer Vision
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Local invariant feature detectors: a survey

Foundations and Trends® in Computer Graphics and Vision
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

IEEE Transactions on Pattern Analysis and Machine Intelligence
Large-scale content-based audio retrieval from text queries

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
Comparing compact codebooks for visual categorization

Computer Vision and Image Understanding
Visual-Concept Search Solved?

Computer
Evaluating Color Descriptors for Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Event detection and recognition for semantic annotation of video

Multimedia Tools and Applications
The ICSI-SRI spring 2006 meeting recognition system

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Action recognition by dense trajectories

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News

IEEE Transactions on Multimedia
SUPER: towards real-time event recognition in internet videos

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Semantic Model Vectors for Complex Video Event Recognition

IEEE Transactions on Multimedia
Action bank: A high-level representation of activity in video

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Multimodal feature fusion for robust event detection in web videos

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Aggregating Local Image Descriptors into Compact Codes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Detection bank: an object detection based video representation for multimedia event recognition

Proceedings of the 20th ACM international conference on Multimedia
Recommendations for video event recognition using concept vocabularies

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Large-scale web video event classification by use of Fisher Vectors

WACV '13 Proceedings of the 2013 IEEE Workshop on Applications of Computer Vision (WACV)

Special issue on Multimedia Event Detection

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia Events. SESAME includes multiple bag-of-words event classifiers based on single data types: low-level visual, motion, and audio features; high-level semantic visual concepts; and automatic speech recognition. Event detection performance was evaluated for each event classifier. The performance of low-level visual and motion features was improved by the use of difference coding. The accuracy of the visual concepts was nearly as strong as that of the low-level visual features. Experiments with a number of fusion methods for combining the event detection scores from these classifiers revealed that simple fusion methods, such as arithmetic mean, perform as well as or better than other, more complex fusion methods. SESAME's performance in the 2012 TRECVID MED evaluation was one of the best reported.