Regression testing for wrapper maintenance

Authors:
Nicholas Kushmerick
Affiliations:
-
Venue:
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Year:
1999

Citing 2
Cited 32

Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Wrapper induction for information extraction

Wrapper induction for information extraction

A flexible learning system for wrapping tables and lists in HTML documents

Proceedings of the 11th international conference on World Wide Web
Semantic anomaly detection in online data sources

Proceedings of the 24th International Conference on Software Engineering
Research abstract for semantic anomaly detection in dynamic data feeds with incomplete specifications

Proceedings of the 24th International Conference on Software Engineering
Enabling automatic adaptation in systems with under-specified elements

WOSS '02 Proceedings of the first workshop on Self-healing systems
Gleaning the Web

IEEE Intelligent Systems
A Case-Based Recognition of Semantic Structures in HTML Documents

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
A Case-Based Transformation from HTML to XML

IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
Automatic Extraction of Semantically-Meaningful Information from the Web.

AH '02 Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Integrating and customizing heterogeneous e-commerce applications

The VLDB Journal — The International Journal on Very Large Data Bases
Semi-automatic wrapper generation and adaption: living with heterogeneity in a market environment

Enterprise information systems IV
Accurately and reliably extracting data from the Web: a machine learning approach

Intelligent exploration of the web
Schema-guided wrapper maintenance for web-data extraction

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Constraint-based wrapper specification and verification for cooperative information systems

Information Systems - Special issue: Data quality in cooperative information systems
Automatic wrapper maintenance for semi-structured web sources using results from previous queries

Proceedings of the 2005 ACM symposium on Applied computing
Using machine learning to maintain rule-based named-entity recognition and classification systems

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Documentum ECI self-repairing wrappers: performance analysis

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Automatically maintaining wrappers for semi-structured web sources

Data & Knowledge Engineering
Automatically maintaining navigation sequences for querying semi-structured web sources

Data & Knowledge Engineering
Information Extraction

Foundations and Trends in Databases
Wrapper maintenance: a machine learning approach

Journal of Artificial Intelligence Research
Web wrapper validation

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
No Code Required: Giving Users Tools to Transform the Web

No Code Required: Giving Users Tools to Transform the Web
Adaptive information extraction: core technologies for information agents

Intelligent information agents
Web page analysis based on HTML DOM and its usage for forum statistics and alerts

ECC'10 Proceedings of the 4th conference on European computing conference
Web page analysis based on HTML DOM and its usage for forum statistics, alerts and geo targeted data retrieval

WSEAS Transactions on Computers
Adaptable wrapper generation for web page format change

ACOS'06 Proceedings of the 5th WSEAS international conference on Applied computer science
Intelligent self-repairable web wrappers

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Mechanisms of knowledge evolution for web information extraction

Proceedings of the 2005 international conference on Federation over the Web
Maintaining web navigation flows for wrappers

DEECS'06 Proceedings of the Second international conference on Data Engineering Issues in E-Commerce and Services
Learning to adapt cross language information extraction wrapper

Applied Intelligence
TEX: An efficient and effective unsupervised Web information extractor

Knowledge-Based Systems
Intelligent and adaptive crawling of web applications for web archiving

ICWE'13 Proceedings of the 13th international conference on Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent work on Internet information integration assumes a library of wrappers, specialized information extraction procedures. Maintaining wrappers is difficult, because the formatting regularities on which they rely often change. The wrapper verification problem is to determine whether a wrapper is correct. Standard regression testing approaches are inappropriate, because both the formatting regularities and a site's underlying content may change. Wei ntroduce RAPTURE, a fully-implemented, domain-independenvt erification algorithm. RAPTURE uses well-motivated heuristics to compute the similarity between a wrapper's expected and observed output. Experiments with 27 actual Internet sites show a substantial performance improvement over standard regression testing.