On Affine Invariant Clustering and Automatic Cast Listing in Movies
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
Finding faces in cluttered scenes using random labeled graph matching
ICCV '95 Proceedings of the Fifth International Conference on Computer Vision
The Journal of Machine Learning Research
Face recognition: A literature survey
ACM Computing Surveys (CSUR)
Words and pictures in the news
HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Gathering and ranking photos of named entities with high precision, high recall, and diversity
Proceedings of the third ACM international conference on Web search and data mining
Identifying persons in news article images based on textual analysis
ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
A unified framework for context assisted face clustering
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Hi-index | 0.00 |
The phenomenal growth of Image/Video on the web and the increasing sparseness of meta information to go along with forces us to look for signals from the Image/Video content for Search / Information Retrieval and Browsing based corpus exploration. One of the prominent type of information that users look for while searching/browsing through such corpora is information around the people present in the Image/Video. While face recognition has matured to some extent over the past few years, this problem remains a hard one due to a) absence of labelled data for such a large set of celebrities that users look for and b) the variability of age/makeup/expressions/pose in the target corpus. We propose a learning paradigm which we refer to as consistency learning to address both these issues by posing the problem of learning from weakly labelled training set. We use the text-image co-occurrence on the web as a weak signal of relevance and learn the set of consistent face models from this very large and noisy training set. The resulting system learns face models for a large set of celebrities directly from the web and uses it to tag Image/Video for better retrieval. While the proposed method has been applied to faces, we see it broadly applicable in any learning problem with a suitable similarity metric defined. We present results on learning from a very large dataset of 37 million images resulting in a validation accuracy of 92.68%.