Guide focused crawler efficiently and effectively using on-line topical importance estimation

  • Authors:
  • Ziyu Guan;Can Wang;Chun Chen;Jiajun Bu;Junfeng Wang

  • Affiliations:
  • Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China

  • Venue:
  • Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Focused crawling is a critical technique for topical resource discovery on the Web. We propose a new frontier prioritizing algorithm, namely, the OTIE (On-line Topical Importance Estimation) algorithm, which efficiently and effectively combines link-based and content-based analysis to evaluate the priority of an uncrawled URL in the frontier. We then demonstrate OTIE's advantages over traditional prioritizing algorithms by real crawling experiments.