Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning variable-length Markov models of behavior
Computer Vision and Image Understanding - Modeling people toward vision-based underatanding of a person's shape, appearance, and movement
ACM Computing Surveys (CSUR)
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
WARP: Accurate Retrieval of Shapes Using Phase of Fourier Descriptors and Time Warping Distance
IEEE Transactions on Pattern Analysis and Machine Intelligence
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Efficient Visual Event Detection Using Volumetric Features
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
A unified shot boundary detection framework based on graph partition model
Proceedings of the 13th annual ACM international conference on Multimedia
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Visual attention detection in video sequences using spatiotemporal cues
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
A Visual Attention Based Region-of-Interest Determination Framework for Video Sequences*
IEICE - Transactions on Information and Systems
Cast indexing for videos by NCuts and page ranking
Proceedings of the 6th ACM international conference on Image and video retrieval
Efficient spatiotemporal-attention-driven shot matching
Proceedings of the 15th international conference on Multimedia
A 3-dimensional sift descriptor and its application to action recognition
Proceedings of the 15th international conference on Multimedia
UQLIPS: a real-time near-duplicate video clip detection system
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Attention-driven action retrieval with DTW-based 3d descriptor matching
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Place retrieval with graph-based place-view model
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Hierarchical space-time model enabling efficient search for human actions
IEEE Transactions on Circuits and Systems for Video Technology
Face recognition from 2D and 3D images using 3D Gabor filters
Image and Vision Computing
Robust content-based video copy identification in a large reference database
CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Fast similarity search and clustering of video sequences on the world-wide-web
IEEE Transactions on Multimedia
Content-Based Copy Retrieval Using Distortion-Based Probabilistic Similarity Search
IEEE Transactions on Multimedia
Spatiotemporal salient points for visual recognition of human actions
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A fully automated content-based video search engine supporting spatiotemporal queries
IEEE Transactions on Circuits and Systems for Video Technology
Temporal Spectral Residual for fast salient motion detection
Neurocomputing
3D human face description: landmarks measures and geometrical features
Image and Vision Computing
Action segmentation in dance videos
PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
A recursive embedding algorithm towards lossless 2D vector map watermarking
Digital Signal Processing
Human action recognition employing negative space features
Journal of Visual Communication and Image Representation
Hi-index | 0.01 |
Human actions in movies and sitcoms usually capture semantic cues for story understanding, which offer a novel search pattern beyond the traditional video search scenario. However, there are great challenges to achieve action-level video search, such as global motions, concurrent actions, and actor appearance variances. In this paper, we introduce a generalized action retrieval framework, which achieves fully unsupervised, robust, and actor-independent action search in large-scale database. First, an Attention Shift model is presented to extract human-focused foreground actions from videos containing global motions or concurrent actions. Subsequently, a spatiotemporal vocabulary is built based on 3D-SIFT features extracted from these human-focused action regions. These 3D-SIFT features offer robustness against rotations and viewpoints. And the spatiotemporal vocabulary guarantees our search efficiency, which is achieved by inverted indexing structure with approximate nearest-neighbor search. In the online ranking, we employ dynamic time warping distance to handle the action duration variances, as well as partial action matching. Finally, an appearance hashing strategy is presented to address the performance degeneration caused by divergent actor appearances. For experimental validation, we have deployed actor-independent action retrieval framework in 3-season ''Friends'' sitcoms (over 30h). In this database, we have reported the best performance (MAP@10.53) with comparisons to alternative and state-of-the-art approaches.