Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mean Shift: A Robust Approach Toward Feature Space Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Efficient Visual Event Detection Using Volumetric Features
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Visual attention detection in video sequences using spatiotemporal cues
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
A Visual Attention Based Region-of-Interest Determination Framework for Video Sequences*
IEICE - Transactions on Information and Systems
Efficient spatiotemporal-attention-driven shot matching
Proceedings of the 15th international conference on Multimedia
A 3-dimensional sift descriptor and its application to action recognition
Proceedings of the 15th international conference on Multimedia
Cross-media manifold learning for image retrieval & annotation
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Mining city landmarks from blogs by graph modeling
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Hierarchical space-time model enabling efficient search for human actions
IEEE Transactions on Circuits and Systems for Video Technology
Face recognition from 2D and 3D images using 3D Gabor filters
Image and Vision Computing
Location Discriminative Vocabulary Coding for Mobile Landmark Search
International Journal of Computer Vision
Video mining with frequent itemset configurations
CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
Real-time human pose recognition in parts from single depth images
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Multimedia
Spatiotemporal salient points for visual recognition of human actions
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.01 |
In this paper, we present a spatiotemporal co-location video pattern mining approach with application to robust action retrieval in YouTube videos. First, we introduce an attention shift scheme to detect and partition the focused human actions from YouTube videos, which is based upon the visual saliency [13] modeling together with both the face [35] and body [32] detectors. From the segmented spatiotemporal human action regions, we extract 3D-SIFT [17] detector. Then, we quantize all detected interest points from the reference YouTube videos into a vocabulary, based on which assign each individual interest point with a word identity. An APrior based frequent itemset mining scheme is then deployed over the spatiotemporal co-located words to discover co-location video patterns. Finally, we fuse both visual words and patterns and leverage a boosting based feature selection to output the final action descriptors, which incorporates the ranking distortion of the conjunctive queries into the boosting objective. We carried out quantitative evaluations over both KTH human motion benchmark [26], as well as over 60-hour YouTube videos, with comparisons to the state-of-the-arts.