Adapting Information Extraction Knowledge For Unseen Web Sites

  • Authors:
  • Tak-Lam Wong;Wai Lam

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a wrapper adaptation framework which aimsat adapting a learned wrapper to an unseen Web site. It significantlyreduces human effort in constructing wrappers.Our framework makes use of extraction rules previously discoveredfrom a particular site to seek potential training ex-amplecandidates for an unseen site. Rule generalizationand text categorization are employed for finding suitable examplecandidates. Another feature of our approach is thatit makes use of the previously discovered lexicon to classifygood training examples automatically for the new site. Weconducted extensive experiments to evaluate the quality ofthe extraction performance and the adaptability of our approach.