A novel crawling algorithm for web pages

  • Authors:
  • Mohammad Amin Golshani;Vali Derhami;AliMohammad ZarehBidoki

  • Affiliations:
  • Department of Electrical and Computer Engineering, Yazd University, Yazd, Iran;Department of Electrical and Computer Engineering, Yazd University, Yazd, Iran;Department of Electrical and Computer Engineering, Yazd University, Yazd, Iran

  • Venue:
  • AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Crawler is a main component of search engines. In search engines, crawler part is responsible for discovering and downloading web pages. No search engine can cover whole of the web, thus it has to focus on the most valuable web pages. Several Crawling algorithms like PageRank, OPIC and FICA have been proposed, but they have low throughput. To overcome the problem, we propose a new crawling algorithm, called FICA+ which is easy to implement. In FICA+, importances of pages are determined based on the logarithmic distance and weight of the incoming links. To evaluate FICA+ we use web graph of university of California, Berkeley. Experimental result shows that our algorithm outperforms other crawling algorithms in discovering highly important pages.