Improvement of HITS for topic-specific web crawler

  • Authors:
  • Xiaojun Zong;Yi Shen;Xiaoxin Liao

  • Affiliations:
  • Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, China;Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, China;Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, China

  • Venue:
  • ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers. Topic-specific web crawler is developed to collect relevant web pages of interested topics form the Internet. Based on the analyses of HITS algorithm, a new P-HITS algorithm is proposed for topic-specific web crawler in this paper. Probability is introduced to select the URLs to get more global optimality, and the metadata of hyperlinks is appended in this algorithm to predict the relevance of web pages better. Experimental results indicate that our algorithm has better performance.