Regression testing for wrapper maintenance

  • Authors:
  • Nicholas Kushmerick

  • Affiliations:
  • -

  • Venue:
  • AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent work on Internet information integration assumes a library of wrappers, specialized information extraction procedures. Maintaining wrappers is difficult, because the formatting regularities on which they rely often change. The wrapper verification problem is to determine whether a wrapper is correct. Standard regression testing approaches are inappropriate, because both the formatting regularities and a site's underlying content may change. Wei ntroduce RAPTURE, a fully-implemented, domain-independenvt erification algorithm. RAPTURE uses well-motivated heuristics to compute the similarity between a wrapper's expected and observed output. Experiments with 27 actual Internet sites show a substantial performance improvement over standard regression testing.