A "string of feature graphs" model for recognition of complex activities in natural videos

Authors:
U. Gaur;Y. Zhu;B. Song;A. Roy-Chowdhury
Affiliations:
University of California, Riverside, 92521, USA;University of California, Riverside, 92521, USA;University of California, Riverside, 92521, USA;University of California, Riverside, 92521, USA
Venue:
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Year:
2011

Citing 0
Cited 9

Propagative hough voting for human activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Spatio-Temporal phrases for activity recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Real-Time exact graph matching with application in human action recognition

HBU'12 Proceedings of the Third international conference on Human Behavior Understanding
The role of spatial context in activity recognition

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
Combinational subsequence matching for human identification from general actions

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
Vector field analysis for multi-object behavior modeling

Image and Vision Computing
Analyzing growing plants from 4D point cloud data

ACM Transactions on Graphics (TOG)
Modeling multi-object interactions using "string of feature graphs"

Computer Vision and Image Understanding
Graph-based approach for human action recognition using spatio-temporal features

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Videos usually consist of activities involving interactions between multiple actors, sometimes referred to as complex activities. Recognition of such activities requires modeling the spatio-temporal relationships between the actors and their individual variabilities. In this paper, we consider the problem of recognition of complex activities in a video given a query example. We propose a new feature model based on a string representation of the video which respects the spatio-temporal ordering. This ordered arrangement of local collections of features (e.g., cuboids, STIP), which are the characters in the string, are initially matched using graph-based spectral techniques. Final recognition is obtained by matching the string representations of the query and the test videos in a dynamic programming framework which allows for variability in sampling rates and speed of activity execution. The method does not require tracking or recognition of body parts, is able to identify the region of interest in a cluttered scene, and gives reasonable performance with even a single query example. We test our approach in an example-based video retrieval framework with two publicly available complex activity datasets and provide comparisons against other methods that have studied this problem.