The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Building a distributed full-text index for the Web
Proceedings of the 10th international conference on World Wide Web
Mercator: A scalable, extensible Web crawler
World Wide Web
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Retrieving and Semantically Integrating Heterogeneous Data from the Web
IEEE Intelligent Systems
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
The complexity of XPath query evaluation and XML typing
Journal of the ACM (JACM)
Optimized Index Structures for Querying RDF from the Web
LA-WEB '05 Proceedings of the Third Latin American Web Congress
Towards a scalable search and query engine for the web
Proceedings of the 16th international conference on World Wide Web
Sindice.com: a document-oriented lookup index for open linked data
International Journal of Metadata, Semantics and Ontologies
Large scale integration of senses for the semantic web
Proceedings of the 18th international conference on World wide web
Semantic Service Search Engine (S3E): An Approach for Finding Services on the Web
WSKS '09 Proceedings of the 2nd World Summit on the Knowledge Society: Visioning and Engineering the Knowledge Society. A Web Science Perspective
On the Ostensibly Silent `W' in OWL 2 RL
RR '09 Proceedings of the 3rd International Conference on Web Reasoning and Rule Systems
YARS2: a federated repository for querying graph structured data from the web
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Sindice.com: weaving the open linked data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
A more specific events classification to improve crawling techniques
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Proceedings of the 1st International Workshop on Linked Web Data Management
WebOWL: A Semantic Web search engine development experiment
Expert Systems with Applications: An International Journal
The Semantic Service Search Engine (S3E)
Journal of Intelligent Information Systems
Relatedness between vocabularies on the Web of data: A taxonomy and an empirical study
Web Semantics: Science, Services and Agents on the World Wide Web
Hi-index | 0.00 |
The goal of the work presented in this paper is to obtain large amounts of semistructured data from the web. Harvesting semistructured data is a prerequisite to enabling large-scale query answering over web sources. We contrast our approach to conventional web crawlers, and describe and evaluate a five-step pipelined architecture to crawl and index data from both the traditional and the Semantic Web.