A novel web page categorization algorithm based on block propagation using query-log information

  • Authors:
  • Wenyuan Dai;Yong Yu;Cong-Le Zhang;Jie Han;Gui-Rong Xue

  • Affiliations:
  • Apex Data & Knowledge Management Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Apex Data & Knowledge Management Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Apex Data & Knowledge Management Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Apex Data & Knowledge Management Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Apex Data & Knowledge Management Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most existing web page classification algorithms, including content-based, link-based, or query-log analysis methods, treat the pages as smallest units. However, web pages usually contain some noisy or biased information which could affect the performance of classification. In this paper, we propose a Block Propagation Categorization (BPC) algorithm which deep mines web structure and views blocks as basic semantic units. Moreover, with query log information, BPC propagates only suitable information (block) among web pages to emphasize their topics. We also optimize the BPC algorithm to significantly speed up the block propagation process, without losing any precision. Our experiments on ODP and MSN search engine log show that BPC achieves a great improvement over traditional approaches.