BioCrawler: An intelligent crawler for the semantic web

  • Authors:
  • Alexandros Batzios;Christos Dimou;Andreas L. Symeonidis;Pericles A. Mitkas

  • Affiliations:
  • Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece;Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece;Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece;Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2008

Quantified Score

Hi-index 12.05

Visualization

Abstract

Web crawling has become an important aspect of web search, as the WWW keeps getting bigger and search engines strive to index the most important and up to date content. Many experimental approaches exist, but few actually try to model the current behaviour of search engines, which is to crawl and refresh the sites they deem as important, much more frequently than others. BioCrawler mirrors this behaviour on the semantic web, by applying the learning strategies adopted in previous work on ecosystem simulation, called BioTope. BioCrawler employs the principles of BioTope's intelligent agents on the semantic web, learns which sites are rich in semantic content and which sites link to them and adjusts its crawling habits accordingly. In the end, it learns to behave much like the state of the art search engine crawlers do. However, BioCrawler reaches that behavior solely by exploiting on-page factors, rather than off-page factors, such as the currently used link popularity.