Learning people annotation from the web via consistency learning

Authors:
Jay Yagnik;Atiq Islam
Affiliations:
Google Inc., Mountain View, CA;University of Memphis, Memphis, TN
Venue:
Proceedings of the international workshop on Workshop on multimedia information retrieval
Year:
2007

Citing 6
Cited 4

On Affine Invariant Clustering and Automatic Cast Listing in Movies

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
Finding faces in cluttered scenes using random labeled graph matching

ICCV '95 Proceedings of the Fifth International Conference on Computer Vision
Matching words and pictures

The Journal of Machine Learning Research
Face recognition: A literature survey

ACM Computing Surveys (CSUR)
Words and pictures in the news

HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
Names and faces in the news

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Gathering and ranking photos of named entities with high precision, high recall, and diversity

Proceedings of the third ACM international conference on Web search and data mining
A comparative study of preprocessing mismatch effects in color image based face recognition

Pattern Recognition
Identifying persons in news article images based on textual analysis

ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
A unified framework for context assisted face clustering

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The phenomenal growth of Image/Video on the web and the increasing sparseness of meta information to go along with forces us to look for signals from the Image/Video content for Search / Information Retrieval and Browsing based corpus exploration. One of the prominent type of information that users look for while searching/browsing through such corpora is information around the people present in the Image/Video. While face recognition has matured to some extent over the past few years, this problem remains a hard one due to a) absence of labelled data for such a large set of celebrities that users look for and b) the variability of age/makeup/expressions/pose in the target corpus. We propose a learning paradigm which we refer to as consistency learning to address both these issues by posing the problem of learning from weakly labelled training set. We use the text-image co-occurrence on the web as a weak signal of relevance and learn the set of consistent face models from this very large and noisy training set. The resulting system learns face models for a large set of celebrities directly from the web and uses it to tag Image/Video for better retrieval. While the proposed method has been applied to faces, we see it broadly applicable in any learning problem with a suitable similarity metric defined. We present results on learning from a very large dataset of 37 million images resulting in a validation accuracy of 92.68%.