SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Extracting semi-structured data through examples
Proceedings of the eighth international conference on Information and knowledge management
Computational aspects of resilient data extraction from semistructured sources (extended abstract)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A natural language interface for information retrieval from forms on the World Wide Web
ICIS '99 Proceedings of the 20th international conference on Information Systems
Generating wrappers for command line programs: the Cal-Aggie Wrap-O-Matic project
ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
World Wide Web
DEByE - Date extraction by example
Data & Knowledge Engineering
Object-Oriented Mediator Queries to Internet Search Engines
OOIS '02 Proceedings of the Workshops on Advances in Object-Oriented Information Systems
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Design and Implementation of Modularized Wrappers/ Monitors in a Data Warehouse
DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
LDAP, Databases and Distributed Objects: Towards a Better Integration
DBTel '01 Proceedings of the VLDB 2001 International Workshop on Databases in Telecommunications II
DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
Locating and accessing data repositories with WebSemantics
The VLDB Journal — The International Journal on Very Large Data Bases
The use of web structure and content to identify subjectively interesting web usage patterns
ACM Transactions on Internet Technology (TOIT)
Web data retrieval and extraction
Data & Knowledge Engineering - Special issue: Data integration over the Web
Semi-automatic wrapper generation and adaption: living with heterogeneity in a market environment
Enterprise information systems IV
Incremental Maintenance of Schema-Restructuring Views in SchemaSQL
IEEE Transactions on Knowledge and Data Engineering
How to make web sites talk together: web service solution
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Pollock: automatic generation of virtual web services from web sites
Proceedings of the 2005 ACM symposium on Applied computing
IEEE Transactions on Knowledge and Data Engineering
Journal of Systems Architecture: the EUROMICRO Journal
Flexible reuse of middleware infrastructures in heterogeneous IT environments
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
Scalable knowledge extraction from legacy sources with SEEK
ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Combining artificial intelligence and databases for data integration
Artificial intelligence today
Information extraction from semi-structured web documents
KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
An algorithm of online goods information extraction with two-stage working pattern
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Hi-index | 0.00 |
There is an increase in the number of data sources that can be queried across the WWW. Such sources typically support HTML forms-based interfaces and search engines query collections of suitably indexed data. The data is displayed via a browser. One drawback is that there is no standard programming interface suitable for applications to submit queries. Second, the output (answer to a query) is not well structured. Structured objects have to be extracted from the HTML documents which contain irrelevant data and which may be volatile. Third, domain knowledge about the data source is also embedded in HTML documents and must be extracted. To solve these problems, we present technology to define and (automatically) generate wrappers for Web accessible sources. Our contributions are as follows: (1)Defining a wrapper interface to specify the capability of Web accessible data sources. (2) Developing a wrapper generation toolkit of graphical interfaces and specification languages to specify the capability of sources and the functionality of the wrapper. (3) Developing the technology to automatically generate a wrapper appropriate to the Web accessible source, from the specifications.