wHunter: a focused web crawler – a tool for digital library

  • Authors:
  • Yun Huang;YunMing Ye

  • Affiliations:
  • Department of Computer Science, Shanghai Jiaotong University;Department of Computer Science, Shanghai Jiaotong University

  • Venue:
  • ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Topic-driven Web Crawler or focused crawler is the key tool of on-line web information library. It's a challenging issue that how to achieve good performance efficiently with limited time and space resources. This paper proposes a focused web crawler wHunter that implements incremental and multi-strategy learning by taking the advantages of both SVM (support vector machines) and naïve Bayes. On the one hand, the initial performance is guaranteed via SVM classifier; on the other hand, when enough web pages are obtained, the classifier is switched to naïve Bayes so that on-line incremental learning is achieved. Experimental results show that our proposed algorithm is efficient and easy to implement.