A uniform framework for integration of information from the web

Authors:
Wolfgang May;Georg Lausen
Affiliations:
Institut für Informatik, Lotzestrasse 16-18, D-37083 Göttingen, Germany;Institut für Informatik, Georges-Koehler-Allee, D-79110 Freiburg, Germany
Venue:
Information Systems - Special issue on web data integration
Year:
2004

Citing 35
Cited 1

Mediators in the Architecture of Future Information Systems

Computer
Logical foundations of object-oriented and frame-based languages

Journal of the ACM (JACM)
A query language and optimization techniques for unstructured data

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The TSIMMIS Approach to Mediation: Data Models and Languages

Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Semistructured data

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Wrapper generation for semi-structured Internet sources

ACM SIGMOD Record
Your mediators need data conversion!

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Catching the boat with Strudel: experiences with a Web-site management system

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Information gathering in the World-Wide Web: the W3QL query language and the W3QS system

ACM Transactions on Database Systems (TODS)
Database techniques for the World-Wide Web: a survey

ACM SIGMOD Record
XML-based information mediation with MIX

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Managing semistructured data with florid: a deductive object-oriented perspective

Information Systems - Special issue on semistructured data
A query language for XML

WWW '99 Proceedings of the eighth international conference on World Wide Web
Updating XML

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Monadic datalog and the expressive power of languages for web information extraction

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On a Declarative Semantics for Web Queries

DOOD '97 Proceedings of the 5th International Conference on Deductive and Object-Oriented Databases
FLORID: A Prototype for F-Logic

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
MedMaker: A Mediation System Based on Declarative Specifications

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Queries and Computation on the Web

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Querying Semi-Structured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
WebOQL: Restructuring Documents, Databases, and Webs

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
To Weave the Web

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
LoPiX: A System for XML Data Integration and Manipulation

Proceedings of the 27th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
Access to Objects by Path Expressions and Rules

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
W3QS: A Query System for the World-Wide Web

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Bringing Database Functionality to the WWW

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Jedi: Extracting and Synthesizing Information from the Web

COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
A Conceptual-Modeling Approach to Extracting Data from the Web

ER '98 Proceedings of the 17th International Conference on Conceptual Modeling
An Integrated Architecture for Exploring, Wrapping, Mediating and Restructuring Information from the Web

ADC '00 Proceedings of the Australasian Database Conference
Modeling and Querying Structure and Contents of the Web

DEXA '99 Proceedings of the 10th International Workshop on Database & Expert Systems Applications
Looking at the Web through XML Glasses

COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
A Declarative Language for Querying and Restructuring the Web

RIDE '96 Proceedings of the 6th International Workshop on Research Issues in Data Engineering (RIDE '96) Interoperability of Nontraditional Database Systems
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

WetDL: a web information extraction language

ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss a system that implements an integrated framework for Web exploration, wrapping, data integration, and querying. Here, the "integration" applies in three aspects: the data model and the functionality, and the architecture. The core of the approach is a unified framework--i.e., data model and language--in which all tasks are performed. We regard the Web and its contents as a unit, represented in a semi-structured, object-oriented data model: the Web structure, given by its hyperlinks, the parse-trees of Web pages, and its contents are all included in the internal world model of the system. Additionally, the application-level model is immediately generated as an overlay of this source-level model. The model is complemented by a rule-based object-oriented language which is extended by Web accessing capabilities and structured document analysis. This language is implemented by a central reasoning engine.The advantage of our unified approach is that the same data manipulation and query language can be used for all tasks, i.e., accessing Web pages, wrapping, data integration, and querying information. Thus, these tasks are not necessarily separated, but can be closely intertwined. Additionally, by reusing the source-level model for generating the application-level model, there is no overhead for communication and mapping between different data formats.In particular, we present a methodology for reusing generic rule patterns for typical extraction, integration, and restructuring tasks. In an abstract sense, the system contains a universal wrapper, which can be applied to arbitrary Web pages that the system considers during information processing. Equipped with suitably intelligent rules, the system can potentially explore initially unknown parts of the Web, thus coping with the steady growth of the Web.We show the practicability of our approach by using the FLORID system (Proceedings of the Workshop on Deductive Databases and Logic Programming (DDLP'98) (1998) 47-57).