Pictor: an interactive system for importing data from a website

  • Authors:
  • Shuyi Zheng;Matthew R. Scott;Ruihua Song;Ji-Rong Wen

  • Affiliations:
  • Pennsylvania State University, University Park, PA, USA;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China

  • Venue:
  • Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a demonstration of an interactive wrapper induction system, called Pictor, which is able to minimize labeling cost, yet extract data with high accuracy from a website. Our demonstration will introduce two proposed technologies: record-level wrappers and a wrapper-assisted labeling strategy. These approaches allow Pictor to exploit previously generated wrappers, in order to predict similar labels in a partially labeled webpage or a completely new webpage. Our experiment results show the effectiveness of the Pictor system.