Understanding multimedia content using web scale social media data
Proceedings of the international conference on Multimedia
Summarizing tourist destinations by mining user-generated travelogues and photos
Computer Vision and Image Understanding
Retrieving and ranking unannotated images through collaboratively mining online search results
Proceedings of the 20th ACM international conference on Information and knowledge management
Personalizing automated image annotation using cross-entropy
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Hi-index | 0.14 |
The rapid popularization of digital cameras and mobile phone cameras has led to an explosive growth of personal photo collections by consumers. In this paper, we present a real-time textual query-based personal photo retrieval system by leveraging millions of Web images and their associated rich textual descriptions (captions, categories, etc.). After a user provides a textual query (e.g., "water”), our system exploits the inverted file to automatically find the positive Web images that are related to the textual query "water” as well as the negative Web images that are irrelevant to the textual query. Based on these automatically retrieved relevant and irrelevant Web images, we employ three simple but effective classification methods, k-Nearest Neighbor (kNN), decision stumps, and linear SVM, to rank personal photos. To further improve the photo retrieval performance, we propose two relevance feedback methods via cross-domain learning, which effectively utilize both the Web images and personal images. In particular, our proposed cross-domain learning methods can learn robust classifiers with only a very limited amount of labeled personal photos from the user by leveraging the prelearned linear SVM classifiers in real time. We further propose an incremental cross-domain learning method in order to significantly accelerate the relevance feedback process on large consumer photo databases. Extensive experiments on two consumer photo data sets demonstrate the effectiveness and efficiency of our system, which is also inherently not limited by any predefined lexicon.