A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics

Authors:
Yunchao Gong;Qifa Ke;Michael Isard;Svetlana Lazebnik
Affiliations:
Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA;Microsoft Research Silicon Valley, Mountain View, USA;Microsoft Research Silicon Valley, Mountain View, USA;Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, USA
Venue:
International Journal of Computer Vision
Year:
2014

Citing 43
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Kernel Principal Component Analysis

ICANN '97 Proceedings of the 7th International Conference on Artificial Neural Networks
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Kernel independent component analysis

The Journal of Machine Learning Research
Latent dirichlet allocation

The Journal of Machine Learning Research
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
PLSA-based image auto-annotation: constraining the latent space

Proceedings of the 12th annual ACM international conference on Multimedia
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Multi-labelled classification using maximum entropy method

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Animals on the Web

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Supervised Learning of Semantic Classes for Image Annotation and Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
A Discriminative Kernel-Based Approach to Rank Images from Text Queries

IEEE Transactions on Pattern Analysis and Machine Intelligence
Real-Time Computerized Annotation of Pictures

IEEE Transactions on Pattern Analysis and Machine Intelligence
Annotating Images by Mining Image Search Results

IEEE Transactions on Pattern Analysis and Machine Intelligence
A New Baseline for Image Annotation

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Using large-scale web data to facilitate textual query based retrieval of consumer photos

MM '09 Proceedings of the 17th ACM international conference on Multimedia
NUS-WIDE: a real-world web image database from National University of Singapore

Proceedings of the ACM International Conference on Image and Video Retrieval
Evaluating Color Descriptors for Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving the multilingual user experience of Wikipedia using cross-language name search

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A new approach to cross-modal multimedia retrieval

Proceedings of the international conference on Multimedia
Every picture tells a story: generating sentences from images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Superparsing: scalable nonparametric image parsing with superpixels

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Baby talk: Understanding and generating simple image descriptions

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Bridging the Gap: Query by Semantic Example

IEEE Transactions on Multimedia
WSABIE: scaling up to large vocabulary image annotation

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Generalized Multiview Analysis: A discriminative latent space

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Multi-label visual classification with label exclusive context

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search

International Journal of Computer Vision
Joint image and word sense discrimination for image retrieval

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Metric learning for large scale image classification: generalizing to new classes at near-zero cost

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Image annotation using metric learning in semantic neighbourhoods

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the problem of modeling Internet images and associated text or tags for tasks such as image-to-image search, tag-to-image search, and image-to-tag search (image annotation). We start with canonical correlation analysis (CCA), a popular and successful approach for mapping visual and textual features to the same latent space, and incorporate a third view capturing high-level image semantics, represented either by a single category or multiple non-mutually-exclusive concepts. We present two ways to train the three-view embedding: supervised, with the third view coming from ground-truth labels or search keywords; and unsupervised, with semantic themes automatically obtained by clustering the tags. To ensure high accuracy for retrieval tasks while keeping the learning process scalable, we combine multiple strong visual features and use explicit nonlinear kernel mappings to efficiently approximate kernel CCA. To perform retrieval, we use a specially designed similarity function in the embedded space, which substantially outperforms the Euclidean distance. The resulting system produces compelling qualitative results and outperforms a number of two-view baselines on retrieval tasks on three large-scale Internet image datasets.