A user attention model for video summarization
Proceedings of the tenth ACM international conference on Multimedia
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Actions Sketch: A Novel Action Representation
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
International Journal of Computer Vision
Visual attention detection in video sequences using spatiotemporal cues
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Behavior recognition via sparse spatio-temporal features
ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence
Spatiotemporal saliency for video classification
Image Communication
Salient region detection and segmentation
ICVS'08 Proceedings of the 6th international conference on Computer vision systems
Modeling temporal structure of decomposable motion segments for activity classification
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Activities as time series of human postures
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Track to the future: Spatio-temporal video segmentation with long-range motion cues
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
A generic framework of user attention model and its application in video summarization
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia
An Efficient Spatiotemporal Attention Model and Its Application to Shot Matching
IEEE Transactions on Circuits and Systems for Video Technology
Machine Recognition of Human Activities: A Survey
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.01 |
Automatic understanding of human activities is a huge challenge in multimedia analysis field. This challenge is especially critical in small-scale activities, such as finger motions, and activities in complex scenes. For typical camera views, both global feature and local feature analysis methods are unsuitable. To solve this problem, many studies focus on using spatio-temporal features and feature selection methods to get video representation. However, these spatio-temporal features are problematic for two reasons. First, we are not sure whether these features are meaningful foreground or noise. Second, we are unable to foresee where an activity will occur based on these features. Therefore, a biological feature selection method is needed to reorganize these spatio-temporal features and represent the video in a feature space. In this paper, we propose a graph based Co-Attention model to select more efficient features for activity analysis. Without reducing the dimensionality, our Co-Attention model considers the number of interest points. Our model is derived from correlations among individual tiny activities, whose salient regions are identified by combining an integrated top-down and bottom-up visual attention model, and a motion attention model built by spatio-temporal features instead of optical flow directly. Different from typical attention models, the Co-Attention model allows multiple regions of interest in video co-existing for further analysis. Experimental results on the KTH dataset, YouTube dataset and a new tiny activity dataset, Pump dataset which consist of visual observation data from patients operating an infusion pump, validate our activity analysis approach is more effective than state-of-the-art methods.