Querying websites using compact skeletons

  • Authors:
  • Anand Rajaraman;Jeffrey D. Ullman

  • Affiliations:
  • Cambrian Ventures, 201 San Antonio Circle, Mountain View, CA;Department of Computer Science, Stanford University, Stanford, CA

  • Venue:
  • Journal of Computer and System Sciences - Special issu on PODS 2001
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several commercial applications, such as online comparison shopping and process automation, require integrating information that is scattered across multiple websites or XML documents. Much research has been devoted to this problem, resulting in several research prototypes and commercial implementations. Such systems rely on wrappers that provide relational or other structured interfaces to websites. Traditionally, wrappers have been constructed by hand on a per-website basis, constraining the scalability of the system. We introduce a website structure inference mechanism called compact skeletons that is a step in the direction of automated wrapper generation. Compact skeletons provide a transformation from websites or other hierarchical data, such as XML documents, to relational tables. We study several classes of compact skeletons and provide polynomial-time algorithms and heuristics for automated construction of compact skeletons from websites. Experimental results show that our heuristics work well in practice. We also argue that compact skeletons are a natural extension of commercially deployed techniques for wrapper construction.