Regression testing for wrapper maintenance
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
A brief survey of web data extraction tools
ACM SIGMOD Record
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Schema-guided wrapper maintenance for web-data extraction
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Wrapper maintenance: a machine learning approach
Journal of Artificial Intelligence Research
Foundations and Trends in Databases
No Code Required: Giving Users Tools to Transform the Web
No Code Required: Giving Users Tools to Transform the Web
ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
WebSelF: a web scraping framework
ICWE'12 Proceedings of the 12th international conference on Web Engineering
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Hi-index | 0.00 |
During the last years, significant attention has been paid to the problem of building wrappers for extracting data from semistructured web sources. Nevertheless, since web sources are autonomous, they may experience changes that invalidate the wrappers. In this paper, we present new heuristics and algorithms to address the problem of automatic wrapper maintenance. Our approach is based on collecting query results during wrapper operation and using them later to generate new sets of examples that can be used to induce a new wrapper when the source changes.