Recognition of complex events in open-source web-scale videos: a bottom up approach

Authors:
Subhabrata Bhattacharya
Affiliations:
University of Central Florida, Orlando, FL, USA
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 5
Cited 0

Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Actions as Space-Time Shapes

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
A probabilistic representation for efficient large scale visual recognition tasks

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
HMDB: A large video database for human motion recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recognition of complex events in unconstrained Internet videos is a challenging research problem. In this symposium proposal, we present a systematic decomposition of complex events into hierarchical components and make an in-depth analysis of how existing research are being used to cater to various levels of this hierarchy. We also identify three key stages where we make novel contributions which are necessary to not only improve the overall recognition performance, but also develop richer understanding of these events. At the lowest level, our contributions include (a) compact covariance descriptors of appearance and motion features used in sparse coding framework to recognize realistic actions and gestures, and (b) a Lie-algebra based representation of dominant camera motion present in video shots which can be used as a complementary feature for video analysis. In the next level, we propose an (c) efficient maximum likelihood estimate based representation from low-level features computed from videos which demonstrates state of the art performance in large scale visual concept detection, and finally, we propose to (d) model temporal interactions between concepts detected in video shots through two new discriminative feature spaces derived from Linear dynamical systems which eventually boosts event recognition performance. In all cases, we conduct thorough experiments to demonstrate promising performance gains over some of the prominent approaches.