Fractional PageRank Crawler: Prioritizing URLs Efficiently for Crawling Important Pages Early

  • Authors:
  • Md. Hijbul Alam;Jongwoo Ha;Sangkeun Lee

  • Affiliations:
  • College of Information and Communications, Korea University, Seoul, Republic of Korea;College of Information and Communications, Korea University, Seoul, Republic of Korea;College of Information and Communications, Korea University, Seoul, Republic of Korea

  • Venue:
  • DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Crawling important pages early is a well studied problem. However, the availability of different types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.