Modern Information Retrieval
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
A survey of advances in vision-based human motion capture and analysis
Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
International Journal of Computer Vision
Visual diversification of image search results
Proceedings of the 18th international conference on World wide web
Web image retrieval reranking with multi-view clustering
Proceedings of the 18th international conference on World wide web
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Lightweight web image reranking
MM '09 Proceedings of the 17th ACM international conference on Multimedia
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision
Active reranking for web image search
IEEE Transactions on Image Processing
Dual-ranking for web image retrieval
Proceedings of the ACM International Conference on Image and Video Retrieval
Object Detection with Discriminatively Trained Part-Based Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.01 |
We describe a framework for human action retrieval in still web images by verb queries, for instance "phoning". Firstly, we build a group of visual discriminative instances for each action class, called "Exemplarlets". Thereafter we employ Multiple Kernel Learning (MKL) to learn an optimal combination of histogram intersection kernels, each of which captures a state-of-the-art feature channel. Our features include the distribution of edges, dense visual words and feature descriptors at different levels of spatial pyramid. For a new image we can detect the hot-region using a sliding-window detector learnt via MKL. The hotregion can imply latent actions in the image. After the hot-region has been detected, we build a inverted index in the visual search path, which we called Visual Inverted Index (VII). Finally, fusing the visual search path and the text search path, we can get the accurate results either relevant to text or to visual information. We show both the detection and retrieval results on our newly collected dataset of six actions as well as demonstrate improved performance over existing methods.