Parsing collective behaviors by hierarchical model with varying structure
Proceedings of the 20th ACM international conference on Multimedia
Human activities as stochastic kronecker graphs
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Collective activity localization with contextual spatial pyramid
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
The role of spatial context in activity recognition
Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
Recognizing Human Group Behaviors with Multi-group Causalities
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Hi-index | 0.14 |
In this paper, we go beyond recognizing the actions of individuals and focus on group activities. This is motivated from the observation that human actions are rarely performed in isolation; the contextual information of what other people in the scene are doing provides a useful cue for understanding high-level activities. We propose a novel framework for recognizing group activities which jointly captures the group activity, the individual person actions, and the interactions among them. Two types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. In particular, we propose three different approaches to model the person-person interaction. One approach is to explore the structures of person-person interaction. Differently from most of the previous latent structured models, which assume a predefined structure for the hidden layer, e.g., a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. The second approach explores person-person interaction in the feature level. We introduce a new feature representation called the action context (AC) descriptor. The AC descriptor encodes information about not only the action of an individual person in the video, but also the behavior of other people nearby. The third approach combines the above two. Our experimental results demonstrate the benefit of using contextual information for disambiguating group activities.