Reinforcing Web-object Categorization Through Interrelationships

  • Authors:
  • Gui-Rong Xue;Yong Yu;Dou Shen;Qiang Yang;Hua-Jun Zeng;Zheng Chen

  • Affiliations:
  • Computer Science and Engineering, Shanghai Jiao-Tong University, Shanghai, P.R. China 200030;Computer Science and Engineering, Shanghai Jiao-Tong University, Shanghai, P.R. China 200030;Hong Kong University of Science and Technology, Kowloon, Hong Kong;Hong Kong University of Science and Technology, Kowloon, Hong Kong;Microsoft Research Asia, 5F, Sigma Center, Beijing, P.R.China 100080;Microsoft Research Asia, 5F, Sigma Center, Beijing, P.R.China 100080

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing categorization algorithms deal with homogeneous Web objects, and consider interrelated objects as additional features when taking the interrelationships with other types of objects into account. However, focusing on any single aspect of the inter-object relationship is not sufficient to fully reveal the true categories of Web objects. In this paper, we propose a novel categorization algorithm, called the Iterative Reinforcement Categorization Algorithm (IRC), to exploit the full interrelationship between different types of Web objects on the Web, including Web pages and queries. IRC classifies the interrelated Web objects by iteratively reinforcing the individual classification results of different types of objects via their interrelationship. Experiments on a clickthrough-log dataset from the MSN search engine show that, in terms of the F1 measure, IRC achieves a 26.4% improvement over a pure content-based classification method. It also achieves a 21% improvement over a query-metadata-based method, as well as a 16.4% improvement on F1 measure over the well-known virtual document-based method. Our experiments show that IRC converges fast enough to be applicable to real world applications.