A Method for Focused Crawling Using Combination of Link Structure and Content Similarity

  • Authors:
  • Mohsen Jamali;Hassan Sayyadi;Babak Bagheri Hariri;Hassan Abolhassani

  • Affiliations:
  • Sharif University of Technology, Iran;Sharif University of Technology, Iran;Sharif University of Technology, Iran;Sharif University of Technology, Iran

  • Venue:
  • WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines. A focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Besides specifying topics by some keywords, it is customary also to use some exemplary documents to compute the similarity of a given web document to the topic. In this paper we introduce a new hybride focused crawler, which uses link structure of documents as well as similarity of pages to the topic to crawl the web