Where to crawl next for focused crawlers

  • Authors:
  • Yuki Uemura;Tsuyoshi Itokawa;Teruaki Kitasuka;Masayoshi Aritsugi

  • Affiliations:
  • Computer Science and Electrical Engineering, Graduate School of Science and Technology, Kumamoto University, Kumamoto, Japan;Computer Science and Electrical Engineering, Graduate School of Science and Technology, Kumamoto University, Kumamoto, Japan;Computer Science and Electrical Engineering, Graduate School of Science and Technology, Kumamoto University, Kumamoto, Japan;Computer Science and Electrical Engineering, Graduate School of Science and Technology, Kumamoto University, Kumamoto, Japan

  • Venue:
  • KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since WWW provides a large amount of data, it is useful for innovative and creative activities of human beings to retrieve interesting and useful information effectively and efficiently from WWW. In this paper, we attempt to propose a focused crawler for individual activities. We develop an algorithm for deciding where to crawl next for focused crawlers, by integrating the concept of PageRank into the decision. We empirically evaluate our proposal in terms of precision and target recall. Some results show that our system can give good target recall performance regardless of topics on which the crawler system focuses.