An Ontology-Based Focused Crawler

  • Authors:
  • Lefteris Kozanidis

  • Affiliations:
  • Computer Engineering and Informatics Department, Patras University, Greece 26500

  • Venue:
  • NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a novel approach for building a focused crawler. The goal of our crawler is to effectively identify web pages that relate to a set of pre-defined topics and download them regardless of their web topology or connectivity with other popular pages on the web. The main challenges that we address in our study are: (i) how to effectively identify the pages' topical content before these are fully downloaded and processed and (ii) how to obtain a well-balanced set of training examples that the crawler will regularly consult in its subsequent web visits.