Automated textual descriptions for a wide range of video events with 48 human actions

Authors:
Patrick Hanckmann;Klamer Schutte;Gertjan J. Burghouts
Affiliations:
TNO, The Hague, The Netherlands;TNO, The Hague, The Netherlands;TNO, The Hague, The Netherlands
Venue:
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Year:
2012

Citing 9
Cited 0

Random Forests

Machine Learning
Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions

International Journal of Computer Vision
Probabilistic Classification Between Foreground Objects and Background

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Actions as Space-Time Shapes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Floor Fields for Tracking in High Density Crowd Scenes

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Improving object detection with boosted histograms

Image and Vision Computing
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Presented is a hybrid method to generate textual descriptions of video based on actions. The method includes an action classifier and a description generator. The aim for the action classifier is to detect and classify the actions in the video, such that they can be used as verbs for the description generator. The aim of the description generator is (1) to find the actors (objects or persons) in the video and connect these correctly to the verbs, such that these represent the subject, and direct and indirect objects, and (2) to generate a sentence based on the verb, subject, and direct and indirect objects. The novelty of our method is that we exploit the discriminative power of a bag-of-features action detector with the generative power of a rule-based action descriptor. Shown is that this approach outperforms a homogeneous setup with the rule-based action detector and action descriptor.