Mining spatiotemporal video patterns towards robust action retrieval

Authors:
Liujuan Cao;Rongrong Ji;Yue Gao;Wei Liu;Qi Tian
Affiliations:
Harbin Engineering University, Harbin 150001, China;Columbia University, New York City 10027, United States;Department of Automation, Tsinghua University, 100086, China;Columbia University, New York City 10027, United States;University of Texas at San Antonio, San Antonio 78249-1644, United States
Venue:
Neurocomputing
Year:
2013

Citing 25
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Visual attention detection in video sequences using spatiotemporal cues

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
A Visual Attention Based Region-of-Interest Determination Framework for Video Sequences*

IEICE - Transactions on Information and Systems
Efficient spatiotemporal-attention-driven shot matching

Proceedings of the 15th international conference on Multimedia
A 3-dimensional sift descriptor and its application to action recognition

Proceedings of the 15th international conference on Multimedia
Cross-media manifold learning for image retrieval & annotation

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Mining city landmarks from blogs by graph modeling

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Hierarchical space-time model enabling efficient search for human actions

IEEE Transactions on Circuits and Systems for Video Technology
Face recognition from 2D and 3D images using 3D Gabor filters

Image and Vision Computing
Location Discriminative Vocabulary Coding for Mobile Landmark Search

International Journal of Computer Vision
Video mining with frequent itemset configurations

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
Real-time human pose recognition in parts from single depth images

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback

IEEE Transactions on Pattern Analysis and Machine Intelligence
Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval

IEEE Transactions on Multimedia
Spatiotemporal salient points for visual recognition of human actions

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we present a spatiotemporal co-location video pattern mining approach with application to robust action retrieval in YouTube videos. First, we introduce an attention shift scheme to detect and partition the focused human actions from YouTube videos, which is based upon the visual saliency [13] modeling together with both the face [35] and body [32] detectors. From the segmented spatiotemporal human action regions, we extract 3D-SIFT [17] detector. Then, we quantize all detected interest points from the reference YouTube videos into a vocabulary, based on which assign each individual interest point with a word identity. An APrior based frequent itemset mining scheme is then deployed over the spatiotemporal co-located words to discover co-location video patterns. Finally, we fuse both visual words and patterns and leverage a boosting based feature selection to output the final action descriptors, which incorporates the ranking distortion of the conjunctive queries into the boosting objective. We carried out quantitative evaluations over both KTH human motion benchmark [26], as well as over 60-hour YouTube videos, with comparisons to the state-of-the-arts.