Spatio-temporal video representation with locality-constrained linear coding
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Hi-index | 0.00 |
In this paper, we make three main contributions in the area of action recognition: (i) We introduce the concept of Joint Self-Similarity Volume (Joint SSV) for modeling dynamical systems, and show that by using a new optimized rank-1 tensor approximation of Joint SSV one can obtain compact low-dimensional descriptors that very accurately preserve the dynamics of the original system, e.g. an action video sequence; (ii) The descriptor vectors derived from the optimized rank-1 approximation make it possible to recognize actions without explicitly aligning the action sequences of varying speed of execution or different frame rates; (iii) The method is generic and can be applied using different low-level features such as silhouettes, histogram of oriented gradients, etc. Hence, it does not necessarily require explicit tracking of features in the space-time volume. Our experimental results on three public datasets demonstrate that our method produces remarkably good results and outperforms all baseline methods.