Improvement of PageRank for Focused Crawler

  • Authors:
  • Fuyong Yuan;Chunxia Yin;Jian Liu

  • Affiliations:
  • Yanshan University, China;Yanshan University, China;Yanshan University, China

  • Venue:
  • SNPD '07 Proceedings of the Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing - Volume 02
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers. Focused crawler is developed to collect relevant web pages of interested topics form the Internet. The PageRank algorithm is used in ranking web pages. It estimates the page's authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, we proposed an improved PageRank algorithm, which we called "T-PageRank", and it based on "topical random surfer". The experiment in focused crawler using the T-PageRank has better performance than the Breath-first and PageRank algorithms.