MultiCrawler: a pipelined architecture for crawling and indexing semantic web data

  • Authors:
  • Andreas Harth;Jürgen Umbrich;Stefan Decker

  • Affiliations:
  • Digital Enterprise Research Institute, National University of Ireland, Galway;Digital Enterprise Research Institute, National University of Ireland, Galway;Digital Enterprise Research Institute, National University of Ireland, Galway

  • Venue:
  • ISWC'06 Proceedings of the 5th international conference on The Semantic Web
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of the work presented in this paper is to obtain large amounts of semistructured data from the web. Harvesting semistructured data is a prerequisite to enabling large-scale query answering over web sources. We contrast our approach to conventional web crawlers, and describe and evaluate a five-step pipelined architecture to crawl and index data from both the traditional and the Semantic Web.