Complex events detection using data-driven concepts

Authors:
Yang Yang;Mubarak Shah
Affiliations:
Computer Vision Lab, University of Central Florida;Computer Vision Lab, University of Central Florida
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Year:
2012

Citing 16
Cited 2

Fundamentals of speech recognition

Fundamentals of speech recognition
Topographic ICA as a Model of V1 Receptive Fields

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 4 - Volume 4
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Neural Computation
A fast learning algorithm for deep belief nets

Neural Computation
Kodak's consumer video benchmark data set: concept definition and annotation

Proceedings of the international workshop on Workshop on multimedia information retrieval
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Convolutional learning of spatio-temporal features

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Automatic Concept Detector Refinement for Large-Scale Video Semantic Annotation

ICSC '10 Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Action recognition by dense trajectories

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Concept-Driven Multi-Modality Fusion for Video Search

IEEE Transactions on Circuits and Systems for Video Technology

We are not equally negative: fine-grained labeling for multimedia event detection

Proceedings of the 21st ACM international conference on Multimedia
Max-Margin Early Event Detectors

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic event detection in a large collection of unconstrained videos is a challenging and important task. The key issue is to describe long complex video with high level semantic descriptors, which should find the regularity of events in the same category while distinguish those from different categories. This paper proposes a novel unsupervised approach to discover data-driven concepts from multi-modality signals (audio, scene and motion) to describe high level semantics of videos. Our methods consists of three main components: we first learn the low-level features separately from three modalities. Secondly we discover the data-driven concepts based on the statistics of learned features mapped to a low dimensional space using deep belief nets (DBNs). Finally, a compact and robust sparse representation is learned to jointly model the concepts from all three modalities. Extensive experimental results on large in-the-wild dataset show that our proposed method significantly outperforms state-of-the-art methods.