A self-adaptive strategy for web crawler in in-site search

  • Authors:
  • Rui Sun;Peng Jin;Wei Xiang

  • Affiliations:
  • Laboratory of Intelligent Information Processing and Application, Leshan Normal University, Leshan, China;Laboratory of Intelligent Information Processing and Application, Leshan Normal University, Leshan, China;Laboratory of Intelligent Information Processing and Application, Leshan Normal University, Leshan, China

  • Venue:
  • WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper analyzes some characteristics of in-site search and proposes a self-adaptive strategy for web crawler. This strategy is polite and the number of concurrent threads is automatically adjusted according to the analyses of pages' average download time in different time units. Some factors such as web server load and network bandwidth are synthetically considered. The experimental results show that our strategy can achieve higher performance than some other strategies. It objectively reflects the practical crawling course of web crawler and fully exploit local and network resources.