Learning people annotation from the web via consistency learning

  • Authors:
  • Jay Yagnik;Atiq Islam

  • Affiliations:
  • Google Inc., Mountain View, CA;University of Memphis, Memphis, TN

  • Venue:
  • Proceedings of the international workshop on Workshop on multimedia information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The phenomenal growth of Image/Video on the web and the increasing sparseness of meta information to go along with forces us to look for signals from the Image/Video content for Search / Information Retrieval and Browsing based corpus exploration. One of the prominent type of information that users look for while searching/browsing through such corpora is information around the people present in the Image/Video. While face recognition has matured to some extent over the past few years, this problem remains a hard one due to a) absence of labelled data for such a large set of celebrities that users look for and b) the variability of age/makeup/expressions/pose in the target corpus. We propose a learning paradigm which we refer to as consistency learning to address both these issues by posing the problem of learning from weakly labelled training set. We use the text-image co-occurrence on the web as a weak signal of relevance and learn the set of consistent face models from this very large and noisy training set. The resulting system learns face models for a large set of celebrities directly from the web and uses it to tag Image/Video for better retrieval. While the proposed method has been applied to faces, we see it broadly applicable in any learning problem with a suitable similarity metric defined. We present results on learning from a very large dataset of 37 million images resulting in a validation accuracy of 92.68%.