The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Inducing a semantically annotated lexicon via EM-based clustering
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Unsupervised Discovery of Action Classes
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision
Object Detection with Discriminatively Trained Part-Based Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
A latent dirichlet allocation method for selectional preferences
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Every picture tells a story: generating sentences from images
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Distributional memory: A general framework for corpus-based semantics
Computational Linguistics
Corpus-guided sentence generation of natural images
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Real-Time Visual Concept Classification
IEEE Transactions on Multimedia
Segmentation as selective search for object recognition
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Human action recognition by learning bases of action attributes and parts
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Efficient image annotation for automatic sentence generation
Proceedings of the 20th ACM international conference on Multimedia
Hi-index | 0.00 |
This paper addresses the problem of human action recognition. Typically, visual action recognition systems need visual training examples for all actions that one wants to recognize. However, the total number of possible actions is staggering as not only are there many types of actions but also many possible objects for each action type. Normally, visual training examples are needed for all actions of this combinatorial explosion of possibilities. To address this problem, this paper is a first attempt to propose a general framework for unseen action recognition in still images by exploiting both visual and language models. Based on objects recognized in images by means of visual features, the system suggests the most plausible actions exploiting off-the-shelf language models. All components in the framework are trained on universal datasets, hence the system is general, flexible, and able to recognize actions for which no visual training example has been provided. This paper shows that our model yields good performance on unseen action recognition. It even outperforms a state-of-the-art Bag-of-Words model in a realistic scenario where few visual training examples are available.