Wrapper Generation for Web Accessible Data Sources

Authors:
Jean-Robert Gruser;Louiqa Raschid;M. E. Vidal;Laura Bright
Affiliations:
-;-;-;-
Venue:
COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
Year:
1998

Citing 0
Cited 26

Nodose version 2.0

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Extracting semi-structured data through examples

Proceedings of the eighth international conference on Information and knowledge management
Computational aspects of resilient data extraction from semistructured sources (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A natural language interface for information retrieval from forms on the World Wide Web

ICIS '99 Proceedings of the 20th international conference on Information Systems
Generating wrappers for command line programs: the Cal-Aggie Wrap-O-Matic project

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Wrapper verification

World Wide Web
DEByE - Date extraction by example

Data & Knowledge Engineering
Object-Oriented Mediator Queries to Internet Search Engines

OOIS '02 Proceedings of the Workshops on Advances in Object-Oriented Information Systems
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Design and Implementation of Modularized Wrappers/ Monitors in a Data Warehouse

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
LDAP, Databases and Distributed Objects: Towards a Better Integration

DBTel '01 Proceedings of the VLDB 2001 International Workshop on Databases in Telecommunications II
SPICE: A Flexible Architecture for Integrating Autonomous Databases to Comprise a Distributed Catalogue of Life

DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
Locating and accessing data repositories with WebSemantics

The VLDB Journal — The International Journal on Very Large Data Bases
The use of web structure and content to identify subjectively interesting web usage patterns

ACM Transactions on Internet Technology (TOIT)
Web data retrieval and extraction

Data & Knowledge Engineering - Special issue: Data integration over the Web
Semi-automatic wrapper generation and adaption: living with heterogeneity in a market environment

Enterprise information systems IV
Incremental Maintenance of Schema-Restructuring Views in SchemaSQL

IEEE Transactions on Knowledge and Data Engineering
How to make web sites talk together: web service solution

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Pollock: automatic generation of virtual web services from web sites

Proceedings of the 2005 ACM symposium on Applied computing
STAVIES: A System for Information Extraction from Unknown Web Data Sources through Automatic Web Wrapper Generation Using Clustering Techniques

IEEE Transactions on Knowledge and Data Engineering
MEMPHIS: a mobile agent-based system for enabling acquisition of multilingual content and providing flexible format internet premium services

Journal of Systems Architecture: the EUROMICRO Journal
Flexible reuse of middleware infrastructures in heterogeneous IT environments

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
Scalable knowledge extraction from legacy sources with SEEK

ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Combining artificial intelligence and databases for data integration

Artificial intelligence today
Information extraction from semi-structured web documents

KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
An algorithm of online goods information extraction with two-stage working pattern

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is an increase in the number of data sources that can be queried across the WWW. Such sources typically support HTML forms-based interfaces and search engines query collections of suitably indexed data. The data is displayed via a browser. One drawback is that there is no standard programming interface suitable for applications to submit queries. Second, the output (answer to a query) is not well structured. Structured objects have to be extracted from the HTML documents which contain irrelevant data and which may be volatile. Third, domain knowledge about the data source is also embedded in HTML documents and must be extracted. To solve these problems, we present technology to define and (automatically) generate wrappers for Web accessible sources. Our contributions are as follows: (1)Defining a wrapper interface to specify the capability of Web accessible data sources. (2) Developing a wrapper generation toolkit of graphical interfaces and specification languages to specify the capability of sources and the functionality of the wrapper. (3) Developing the technology to automatically generate a wrapper appropriate to the Web accessible source, from the specifications.