Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-Based Image Retrieval at the End of the Early Years
IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Kernel Principal Component Analysis
ICANN '97 Proceedings of the 7th International Conference on Artificial Neural Networks
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Kernel independent component analysis
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
PLSA-based image auto-annotation: constraining the latent space
Proceedings of the 12th annual ACM international conference on Multimedia
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Multi-labelled classification using maximum entropy method
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
The Journal of Machine Learning Research
Supervised Learning of Semantic Classes for Image Annotation and Retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
Image retrieval: Ideas, influences, and trends of the new age
ACM Computing Surveys (CSUR)
A Discriminative Kernel-Based Approach to Rank Images from Text Queries
IEEE Transactions on Pattern Analysis and Machine Intelligence
Real-Time Computerized Annotation of Pictures
IEEE Transactions on Pattern Analysis and Machine Intelligence
Annotating Images by Mining Image Search Results
IEEE Transactions on Pattern Analysis and Machine Intelligence
A New Baseline for Image Annotation
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Using large-scale web data to facilitate textual query based retrieval of consumer photos
MM '09 Proceedings of the 17th ACM international conference on Multimedia
NUS-WIDE: a real-world web image database from National University of Singapore
Proceedings of the ACM International Conference on Image and Video Retrieval
Evaluating Color Descriptors for Object and Scene Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving the multilingual user experience of Wikipedia using cross-language name search
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A new approach to cross-modal multimedia retrieval
Proceedings of the international conference on Multimedia
Every picture tells a story: generating sentences from images
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Superparsing: scalable nonparametric image parsing with superpixels
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Baby talk: Understanding and generating simple image descriptions
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Bridging the Gap: Query by Semantic Example
IEEE Transactions on Multimedia
WSABIE: scaling up to large vocabulary image annotation
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Generalized Multiview Analysis: A discriminative latent space
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Multi-label visual classification with label exclusive context
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search
International Journal of Computer Vision
Joint image and word sense discrimination for image retrieval
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Metric learning for large scale image classification: generalizing to new classes at near-zero cost
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Image annotation using metric learning in semantic neighbourhoods
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
This paper investigates the problem of modeling Internet images and associated text or tags for tasks such as image-to-image search, tag-to-image search, and image-to-tag search (image annotation). We start with canonical correlation analysis (CCA), a popular and successful approach for mapping visual and textual features to the same latent space, and incorporate a third view capturing high-level image semantics, represented either by a single category or multiple non-mutually-exclusive concepts. We present two ways to train the three-view embedding: supervised, with the third view coming from ground-truth labels or search keywords; and unsupervised, with semantic themes automatically obtained by clustering the tags. To ensure high accuracy for retrieval tasks while keeping the learning process scalable, we combine multiple strong visual features and use explicit nonlinear kernel mappings to efficiently approximate kernel CCA. To perform retrieval, we use a specially designed similarity function in the embedded space, which substantially outperforms the Euclidean distance. The resulting system produces compelling qualitative results and outperforms a number of two-view baselines on retrieval tasks on three large-scale Internet image datasets.