Learning semantic representations of objects and their parts

Authors:
Grégoire Mesnil;Antoine Bordes;Jason Weston;Gal Chechik;Yoshua Bengio
Affiliations:
LISA, Université de Montréal, Montreal, Canada and LITIS, Université de Rouen, Rouen, France;CNRS--Heudiasyc UMR 7253, Université de Technologie de Compiègne, Compiégne, France;Google, New York, USA;Google, Mountain View, USA and Gonda Brain research center, Bar-Ilan University, Ramat Gan, Israel;LISA, Université de Montréal, Montreal, Canada
Venue:
Machine Learning
Year:
2014

Citing 19
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
WordNet: a lexical database for English

Communications of the ACM
Kernel principal component analysis

Advances in kernel methods
Recognizing Surfaces Using Three-Dimensional Textons

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Learning to Detect Objects in Images via a Sparse, Part-Based Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
PLSA-based image auto-annotation: constraining the latent space

Proceedings of the 12th annual ACM international conference on Multimedia
The Pyramid Match Kernel: Efficient Learning with Sets of Features

The Journal of Machine Learning Research
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A Discriminative Kernel-Based Approach to Rank Images from Text Queries

IEEE Transactions on Pattern Analysis and Machine Intelligence
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
A New Baseline for Image Annotation

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Ranking with ordered weighted pairwise classification

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Semi-supervised learning with very few labeled training examples

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Introduction to a large-scale general purpose ground truth database: methodology, annotation tool and benchmarks

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Large scale image annotation: learning to rank with joint word-image embeddings

Machine Learning
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Weakly supervised learning of part-based spatial models for visual object recognition

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, large scale image annotation datasets have been collected with millions of images and thousands of possible annotations. Latent variable models, or embedding methods, that simultaneously learn semantic representations of object labels and image representations can provide tractable solutions on such tasks. In this work, we are interested in jointly learning representations both for the objects in an image, and the parts of those objects, because such deeper semantic representations could bring a leap forward in image retrieval or browsing. Despite the size of these datasets, the amount of annotated data for objects and parts can be costly and may not be available. In this paper, we propose to bypass this cost with a method able to learn to jointly label objects and parts without requiring exhaustively labeled data. We design a model architecture that can be trained under a proxy supervision obtained by combining standard image annotation (from ImageNet) with semantic part-based within-label relations (from WordNet). The model itself is designed to model both object image to object label similarities, and object label to object part label similarities in a single joint system. Experiments conducted on our combined data and a precisely annotated evaluation set demonstrate the usefulness of our approach.