Exploiting language models to recognize unseen actions

Authors:
Dieu Thu Le;Raffaella Bernardi;Jasper Uijlings
Affiliations:
University of Trento, Trento, Italy;University of Trento, Trento, Italy;University of Trento, Trento, Italy
Venue:
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Year:
2013

Citing 17
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Inducing a semantically annotated lexicon via EM-based clustering

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Unsupervised Discovery of Action Classes

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
A latent dirichlet allocation method for selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Every picture tells a story: generating sentences from images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Distributional memory: A general framework for corpus-based semantics

Computational Linguistics
Corpus-guided sentence generation of natural images

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Real-Time Visual Concept Classification

IEEE Transactions on Multimedia
Segmentation as selective search for object recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Human action recognition by learning bases of action attributes and parts

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Efficient image annotation for automatic sentence generation

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of human action recognition. Typically, visual action recognition systems need visual training examples for all actions that one wants to recognize. However, the total number of possible actions is staggering as not only are there many types of actions but also many possible objects for each action type. Normally, visual training examples are needed for all actions of this combinatorial explosion of possibilities. To address this problem, this paper is a first attempt to propose a general framework for unseen action recognition in still images by exploiting both visual and language models. Based on objects recognized in images by means of visual features, the system suggests the most plausible actions exploiting off-the-shelf language models. All components in the framework are trained on universal datasets, hence the system is general, flexible, and able to recognize actions for which no visual training example has been provided. This paper shows that our model yields good performance on unseen action recognition. It even outperforms a state-of-the-art Bag-of-Words model in a realistic scenario where few visual training examples are available.