Bootstrapping for example-based data extraction

Authors:
Paulo B. Golgher;Altigran S. da Silva;Alberto H. F. Laender;Berthier Ribeiro-Neto
Affiliations:
Federal University of Minas Gerais, Belo Horizonte MG Brazil;Federal University of Minas Gerais, Belo Horizonte MG Brazil;Federal University of Minas Gerais, Belo Horizonte MG Brazil;Federal University of Minas Gerais, Belo Horizonte MG Brazil
Venue:
Proceedings of the tenth international conference on Information and knowledge management
Year:
2001

Citing 7
Cited 11

Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Recognizing structure in Web pages using similarity queries

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Extracting semi-structured data through examples

Proceedings of the eighth international conference on Information and knowledge management
Conceptual-model-based data extraction from multiple-record Web pages

Data & Knowledge Engineering
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
A Fully Automated Object Extraction System for the World Wide Web

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems

A brief survey of web data extraction tools

ACM SIGMOD Record
Collecting hidden weeb pages for data extraction

Proceedings of the 4th international workshop on Web information and data management
The Debye Environment for Web Data Management

IEEE Internet Computing
A Framework for Generating Attribute Extractors for Web Data Sources

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
The Web-DL environment for building digital libraries from the Web

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Automatic generation of agents for collecting hidden web pages for data extraction

Data & Knowledge Engineering - Special issue: WIDM 2002
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features

ACM Transactions on Internet Technology (TOIT)
Bootstrapping Information Extraction from Semi-structured Web Pages

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Cross Language Information Extraction Knowledge Adaptation

RSKT '09 Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology
Normalizing web product attributes and discovering domain ontology with minimal effort

Proceedings of the fourth ACM international conference on Web search and data mining
Learning to adapt cross language information extraction wrapper

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The effortless generation of wrappers for Web data sources is a crucial task if proper access to the huge amount of semi-structured data on the Web is to be granted. In particular, the development of strategies for wrapper generation based on user-given examples is currently one of the most promising research directions in Web data extraction. In this paper we show how to use a pre-existing data repository to automatically generate examples and allow full automated example-based data extraction. To demonstrate the feasibility of our approach we provide a number of results obtained from experiments we carried out and discuss how our ideas can be used to improve extraction rates and for providing resilience and adaptiveness for example-based generated wrappers.