A Query Translation Scheme for Rapid Implementation of Wrappers
DOOD '95 Proceedings of the Fourth International Conference on Deductive and Object-Oriented Databases
Object Exchange Across Heterogeneous Information Sources
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Object Fusion in Mediator Systems
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Managing semantic heterogeneity in databases: a theoretical prospective
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Capability based mediation in TSIMMIS
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Modeling Web sources for information integration
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
An XJML-based wrapper generator for Web information extraction
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Extracting semi-structured data through examples
Proceedings of the eighth international conference on Information and knowledge management
Computational aspects of resilient data extraction from semistructured sources (extended abstract)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Information Systems that Really Support Decision-Making
Journal of Intelligent Information Systems - Special issue on methodologies for intelligent information systems
Distributed and Parallel Databases
Answering queries with useful bindings
ACM Transactions on Database Systems (TODS)
A brief survey of web data extraction tools
ACM SIGMOD Record
Merging structured text using temporal knowledge
Data & Knowledge Engineering
Logical fusion rules for merging structured news reports
Data & Knowledge Engineering
DEByE - Date extraction by example
Data & Knowledge Engineering
ACM SIGMOD Record
Integrating Knowledge on the Web
IEEE Internet Computing
Managing Web-Based Data: Database Models and Transformations
IEEE Internet Computing
Data extraction from the web based on pre-defined schema
Journal of Computer Science and Technology
Information Systems That also Project into the Future
DNIS '02 Proceedings of the Second International Workshop on Databases in Networked Information Systems
Modeling Information Sources for Information Integration
EKAW '99 Proceedings of the 11th European Workshop on Knowledge Acquisition, Modeling and Management
Optimizing Large Join Queries in Mediation Systems
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Toward Learning Based Web Query Processing
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Describing and Using Query Capabilities of Heterogeneous Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Semantic Integration and Querying of Heterogeneous Data Sources Using a Hypergraph Data Model
BNCOD 19 Proceedings of the 19th British National Conference on Databases: Advances in Databases
The Design and Implementation of Modularized Wrappers/ Monitors in a Data Warehouse
DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Schema Evolution in Heterogeneous Database Architectures, A Schema Transformation Approach
CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
A Knowledge-Based Information Extraction System for Semi-structured Labeled Documents
IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
A Shopping Agent That Automatically Constructs Wrappers for Semi-Structured Online Vendors
IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
Implementing Powerful Retrieval Capabilities in a Distributed Environment for Libraries and Archives
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Databases and the World Wide Web
SOFSEM '99 Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics on Theory and Practice of Informatics
Wrapper Generation by Using XML-Based Domain Knowledge for Intelligent Information Extraction
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
An Example-Based Environment for Wrapper Generation
ER '00 Proceedings of the Workshops on Conceptual Modeling Approaches for E-Business and The World Wide Web and Conceptual Modeling: Conceptual Modeling for E-Business and the Web
DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
Semi-automatic wrapper generation and adaption: living with heterogeneity in a market environment
Enterprise information systems IV
A semi-universal e-commerce agent: domain-dependant information gathering
Enterprise information systems IV
On Precision and Recall of Multi-Attribute Data Extraction from Semistructured Sources
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Schema-guided wrapper maintenance for web-data extraction
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Bootstrapping Semantic Annotation for Content-Rich HTML Documents
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
How to make web sites talk together: web service solution
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
HW-STALKER: a machine learning-based system for transforming QURE-Pagelets to XML
Data & Knowledge Engineering
Query reformulation for an XML-based data integration system
Proceedings of the 2006 ACM symposium on Applied computing
Fusion rules for merging uncertain information
Information Fusion
A knowledge-based approach to merging information
Knowledge-Based Systems
Ontology-supported FAQ processing and ranking techniques
Journal of Intelligent Information Systems
Data Extraction From Repositories On The Web: A Semi-Automatic Approach
Journal of Integrated Design & Process Science
Advanced Engineering Informatics
OntoMiner: automated metadata and instance mining from news websites
International Journal of Web and Grid Services
Automated Semantic Analysis of Schematic Data
World Wide Web
The Harmony Integration Workbench
Journal on Data Semantics XI
Semantic-based Merging of RSS Items
World Wide Web
Automatic generation of wrapper for data extraction from the web
ICWE'03 Proceedings of the 2003 international conference on Web engineering
Flexible reuse of middleware infrastructures in heterogeneous IT environments
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
Scalable knowledge extraction from legacy sources with SEEK
ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Combining artificial intelligence and databases for data integration
Artificial intelligence today
EG-ICE'06 Proceedings of the 13th international conference on Intelligent Computing in Engineering and Architecture
Reduce, reuse, recycle: practical approaches to schema integration, evolution and versioning
CoMoGIS'06 Proceedings of the 2006 international conference on Advances in Conceptual Modeling: theory and practice
PIES: a web information extraction system using ontology and tag patterns
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
An algorithm of online goods information extraction with two-stage working pattern
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
An interface agent for wrapper-based information extraction
PRIMA'04 Proceedings of the 7th Pacific Rim international conference on Intelligent Agents and Multi-Agent Systems
The HiLeX system for semantic information extraction
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Semistructured data: the TSIMMIS experience
ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems
Automatically extracting user reviews from forum sites
Computers & Mathematics with Applications
Leveraging spatial join for robust tuple extraction from web pages
Information Sciences: an International Journal
Hi-index | 0.01 |
In order to access information from a variety of heterogeneous information sources, one has to be able to translate queries and data from one data model into another. This functionality is provided by so-called (source) wrappers [4,8] which convert queries into one or more commands/queries understandable by the underlying source and transform the native results into a format understood by the application. As part of the TSIMMIS project [1, 6] we have developed hard-coded wrappers for a variety of sources (e.g., Sybase DBMS, WWW pages, etc.) including legacy systems (Folio). However, anyone who has built a wrapper before can attest that a lot of effort goes into developing and writing such a wrapper. In situations where it is important or desirable to gain access to new sources quickly, this is a major drawback. Furthermore, we have also observed that only a relatively small part of the code deals with the specific access details of the source. The rest of the code is either common among wrappers or implements query and data transformation that could be expressed in a high level, declarative fashion.Based on these observations, we have developed a wrapper implementation toolkit [7] for quickly building wrappers. The toolkit contains a library for commonly used functions, such as for receiving queries from the application and packaging results. It also contains a facility for translating queries into source-specific commands, and for translating results into a model useful to the application. The philosophy behind our “template-based” translation methodology is as follows. The wrapper implementor specifies a set of templates (rules) written in a high level declarative language that describe the queries accepted by the wrapper as well as the objects that it returns. If an application query matches a template, an implementor-provided action associated with the template is executed to provide the native query for the underlying source1. When the source returns the result of the query, the wrapper transforms the answer which is represented in the data model of the source into a representation that is used by the application. Using this toolkit one can quickly design a simple wrapper with a few templates that cover some of the desired functionality, probably the one that is most urgently needed. However, templates can be added gradually as more functionality is required later on.Another important use of wrappers is in extending the query capabilities of a source. For instance, some sources may not be capable of answering queries that have multiple predicates. In such cases, it is necessary to pose a native query to such a source using only predicates that the source is capable of handling. The rest of the predicates are automatically separated from the user query and form a filter query. When the wrapper receives the results, a post-processing engine applies the filter query. This engine supports a set of built-in predicates based on the comparison operators =,≠,, etc. In addition, the engine supports more complex predicates that can be specified as part of the filter query. The postprocessing engine is common to wrappers of all sources and is part of the wrapper toolkit. Note that because of postprocessing, the wrapper can handle a much larger class of queries than those that exactly match the templates it has been given. Figure 1 shows an overview of the wrapper architecture as it is currently implemented in our TSIMMIS testbed. Shaded components are provided by the toolkit, the white component is source-specific and must be generated by the implementor. The driver component controls the translation process and invokes the following services: the parser which parses the templates, the native schema, as well as the incoming queries into internal data structures, the matcher which matches a query against the set of templates and creates a filter query for postprocessing if necessary, the native component which submits the generated action string to the source, and extracts the data from the native result using the information given in the source schema, and the engine, which transforms and packages the result and applies a postprocessing filter if one has been created by the matcher. We now describe the sequence of events that occur at the wrapper during the translation of a query and its result using an example from our prototype system. The queries are formulated using a rule-based language called MSL that has been developed as a template specification and query language for the TSIMMIS project. Data is represented using our Object Exchange Model (OEM). We will briefly describe MSL and OEM in the next section. Details on MSL can be found in [5], a full introduction to OEM is given in [1].