A large-scale active learning system for topical categorization on the web

  • Authors:
  • Suju Rajan;Dragomir Yankov;Scott J. Gaffney;Adwait Ratnaparkhi

  • Affiliations:
  • Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA

  • Venue:
  • Proceedings of the 19th international conference on World wide web
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many web applications such as ad matching systems, vertical search engines, and page categorization systems require the identification of a particular type or class of pages on the Web. The sheer number and diversity of the pages on the Web, however, makes the problem of obtaining a good sample of the class of interest hard. In this paper, we describe a successfully deployed end-to-end system that starts from a biased training sample and makes use of several state-of-the-art machine learning algorithms working in tandem, including a powerful active learning component, in order to achieve a good classification system. The system is evaluated on traffic from a real-world ad-matching platform and is shown to achieve high categorization effectiveness with a significant reduction in editorial effort and labeling time.