Intelligent self-repairable web wrappers

Authors:
Emilio Ferrara;Robert Baumgartner
Affiliations:
Dept. of Mathematics, University of Messina, Italy;Lixto Software GmbH, Vienna, Austria
Venue:
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Year:
2011

Citing 13
Cited 0

Identifying syntactic differences between two programs

Software—Practice & Experience
Regression testing for wrapper maintenance

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
A brief survey of web data extraction tools

ACM SIGMOD Record
Wrapper verification

World Wide Web
A Machine Learning Approach to Web Mining

AI*IA '99 Proceedings of the 6th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
Automatic Repairing of Web Wrappers by Combining Redundant Views

ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Schema-guided wrapper maintenance for web-data extraction

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
A survey on tree edit distance and related problems

Theoretical Computer Science
Automatically Generating Labeled Examples for Web Wrapper Maintenance

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Web Information Extraction by HTML Tree Edit Distance Matching

ICCIT '07 Proceedings of the 2007 International Conference on Convergence Information Technology
Information Extraction

Foundations and Trends in Databases
Wrapper maintenance: a machine learning approach

Journal of Artificial Intelligence Research
Scalable web data extraction for online market intelligence

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources - the so called Web wrappers - which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.