Automatic text processing
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto
LPNMR '01 Proceedings of the 6th International Conference on Logic Programming and Nonmonotonic Reasoning
Hi-index | 0.00 |
The world wide web does not longer consist of static web pages. Instead, more and more web pages are created dynamically from user request and database content. Conventional search engines do not consider these dynamic pages, as user input cannot be simulated, thus providing often insufficient results.A new approach for online integration of web databases will be presented in this paper. Providing only one sample HTML result page for a source, result pages for new requests will be found by structural recognition. Once structural recognition is established for one source, other web databases of the same universe (e.g. movie databases) can be integrated on the fly by content-based recognition. Thus, the user receives results from various sources.Global schemata will not be produced at all. Instead, the heterogeneity of the single sources will be preserved. The only requirement is given by the existence of an extensional overlap of the databases.