Action recognition in video by sparse representation on covariance manifolds of silhouette tunnels
ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Semi-supervised action recognition in video via Labeled Kernel Sparse Coding and sparse L1 graph
Pattern Recognition Letters
Hi-index | 0.00 |
Action recognition is a challenging problem in video analytics due to event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. Central to these challenges is the way one models actions in video, i.e., action representation. In this paper, an action is viewed as a temporal sequence of local shape-deformations of centroid-centered object silhouettes, i.e., the shape of the centroid-centered object silhouette tunnel. Each action is represented by the empirical covariance matrix of a set of 13-dimensional normalized geometric feature vectors that capture the shape of the silhouette tunnel. The similarity of two actions is measured in terms of a Riemannian metric between their covariance matrices. The silhouette tunnel of a test video is broken into short overlapping segments and each segment is classified using a dictionary of labeled action covariance matrices and the nearest neighbor rule. On a database of 90 short video sequences this attains a correct classification rate of 97%, which is very close to the state-of-the-art, at almost 5-fold reduced computational cost. Majority-vote fusion of segment decisions achieves 100% classification rate.