Focused crawling by exploiting anchor text using decision tree

  • Authors:
  • Jun Li;Kazutaka Furuse;Kazunori Yamaguchi

  • Affiliations:
  • The University of Tokyo;University of Tsukuba;The University of Tokyo

  • Venue:
  • WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Focused crawlers are considered as a promising way to tackle the scalability problem of topic-oriented or personalized search engines. To design a focused crawler, the choice of strategy for prioritizing unvisited URLs is crucial. In this paper, we propose a method using a decision tree on anchor texts of hyperlinks. We conducted experiments on the real data sets of four Japanese universities and verified our approach.