WrapIt: Automated Integration of Web Databases with Extensional Overlaps

Authors:
Mattis Neiling;Markus Schaal;Martin Schumann
Affiliations:
-;-;-
Venue:
Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
Year:
2002

Citing 7
Cited 0

Automatic text processing

Automatic text processing
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto

LPNMR '01 Proceedings of the 6th International Conference on Logic Programming and Nonmonotonic Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The world wide web does not longer consist of static web pages. Instead, more and more web pages are created dynamically from user request and database content. Conventional search engines do not consider these dynamic pages, as user input cannot be simulated, thus providing often insufficient results.A new approach for online integration of web databases will be presented in this paper. Providing only one sample HTML result page for a source, result pages for new requests will be found by structural recognition. Once structural recognition is established for one source, other web databases of the same universe (e.g. movie databases) can be integrated on the fly by content-based recognition. Thus, the user receives results from various sources.Global schemata will not be produced at all. Instead, the heterogeneity of the single sources will be preserved. The only requirement is given by the existence of an extensional overlap of the databases.