The co-attention model for tiny activity analysis

Authors:
Longfei Zhang;Ziyu Guan;Alexander Hauptmann
Affiliations:
School of Software, Beijing Institute of Technology, Beijing, China and School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA;Computer Science Department, University of California Santa Barbara, Santa Barbara, CA, USA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Neurocomputing
Year:
2013

Citing 16
Cited 0

A user attention model for video summarization

Proceedings of the tenth ACM international conference on Multimedia
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Actions Sketch: A Novel Action Representation

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
On Space-Time Interest Points

International Journal of Computer Vision
Visual attention detection in video sequences using spatiotemporal cues

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Actions as Space-Time Shapes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Spatiotemporal saliency for video classification

Image Communication
Salient region detection and segmentation

ICVS'08 Proceedings of the 6th international conference on Computer vision systems
Modeling temporal structure of decomposable motion segments for activity classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Activities as time series of human postures

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Track to the future: Spatio-temporal video segmentation with long-range motion cues

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
A generic framework of user attention model and its application in video summarization

IEEE Transactions on Multimedia
Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval

IEEE Transactions on Multimedia
An Efficient Spatiotemporal Attention Model and Its Application to Shot Matching

IEEE Transactions on Circuits and Systems for Video Technology
Machine Recognition of Human Activities: A Survey

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.01

Visualization

Abstract

Automatic understanding of human activities is a huge challenge in multimedia analysis field. This challenge is especially critical in small-scale activities, such as finger motions, and activities in complex scenes. For typical camera views, both global feature and local feature analysis methods are unsuitable. To solve this problem, many studies focus on using spatio-temporal features and feature selection methods to get video representation. However, these spatio-temporal features are problematic for two reasons. First, we are not sure whether these features are meaningful foreground or noise. Second, we are unable to foresee where an activity will occur based on these features. Therefore, a biological feature selection method is needed to reorganize these spatio-temporal features and represent the video in a feature space. In this paper, we propose a graph based Co-Attention model to select more efficient features for activity analysis. Without reducing the dimensionality, our Co-Attention model considers the number of interest points. Our model is derived from correlations among individual tiny activities, whose salient regions are identified by combining an integrated top-down and bottom-up visual attention model, and a motion attention model built by spatio-temporal features instead of optical flow directly. Different from typical attention models, the Co-Attention model allows multiple regions of interest in video co-existing for further analysis. Experimental results on the KTH dataset, YouTube dataset and a new tiny activity dataset, Pump dataset which consist of visual observation data from patients operating an infusion pump, validate our activity analysis approach is more effective than state-of-the-art methods.