WDEE: web data extraction by example

  • Authors:
  • Zhao Li;Wee Keong Ng

  • Affiliations:
  • Centre for Advanced Information Systems, Nanyang Technological University;Centre for Advanced Information Systems, Nanyang Technological University

  • Venue:
  • DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web data extraction systems in use today transform semi-structured Web documents and deliver structured documents to end users. Some systems provide a visual interface to users to generate the extraction rules. However, to end users, the visual effect of Web documents is lost during the transformation process. In this paper, we propose an approach that allows a user to query extracted documents without knowledge of formal query language. We bridge the gap between visual effect of Web documents and structured documents extracted by providing a QBE-like (Query by Example) interface named Wdee. The principle component of our method is the notion of a document schema. Document schemata are patterns of structures embedded in documents. Wdee generates tree skeletons based on schema information and a user may execute queries by input condition in the skeltons. By maintaining the mapping relation among schemata of Web documents and extracted documents, a visual example may be presented to end users. With the example, Wdee allows a user to construct tree skeletons in a manner that resembles the browsing of Web pages.