MultiCrawler: a pipelined architecture for crawling and indexing semantic web data

Authors:
Andreas Harth;Jürgen Umbrich;Stefan Decker
Affiliations:
Digital Enterprise Research Institute, National University of Ireland, Galway;Digital Enterprise Research Institute, National University of Ireland, Galway;Digital Enterprise Research Institute, National University of Ireland, Galway
Venue:
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Year:
2006

Citing 10
Cited 12

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Building a distributed full-text index for the Web

Proceedings of the 10th international conference on World Wide Web
Mercator: A scalable, extensible Web crawler

World Wide Web
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Retrieving and Semantically Integrating Heterogeneous Data from the Web

IEEE Intelligent Systems
UbiCrawler: a scalable fully distributed web crawler

Software—Practice & Experience
The complexity of XPath query evaluation and XML typing

Journal of the ACM (JACM)
Optimized Index Structures for Querying RDF from the Web

LA-WEB '05 Proceedings of the Third Latin American Web Congress

Towards a scalable search and query engine for the web

Proceedings of the 16th international conference on World Wide Web
Sindice.com: a document-oriented lookup index for open linked data

International Journal of Metadata, Semantics and Ontologies
Large scale integration of senses for the semantic web

Proceedings of the 18th international conference on World wide web
Semantic Service Search Engine (S3E): An Approach for Finding Services on the Web

WSKS '09 Proceedings of the 2nd World Summit on the Knowledge Society: Visioning and Engineering the Knowledge Society. A Web Science Perspective
On the Ostensibly Silent `W' in OWL 2 RL

RR '09 Proceedings of the 3rd International Conference on Web Reasoning and Rule Systems
YARS2: a federated repository for querying graph structured data from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Sindice.com: weaving the open linked data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
A more specific events classification to improve crawling techniques

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Towards designing an efficient crawling window to analysis and annotate changes in linked data sources

Proceedings of the 1st International Workshop on Linked Web Data Management
WebOWL: A Semantic Web search engine development experiment

Expert Systems with Applications: An International Journal
The Semantic Service Search Engine (S3E)

Journal of Intelligent Information Systems
Relatedness between vocabularies on the Web of data: A taxonomy and an empirical study

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of the work presented in this paper is to obtain large amounts of semistructured data from the web. Harvesting semistructured data is a prerequisite to enabling large-scale query answering over web sources. We contrast our approach to conventional web crawlers, and describe and evaluate a five-step pipelined architecture to crawl and index data from both the traditional and the Semantic Web.