A relational approach to incrementally extracting and querying structure in unstructured data

  • Authors:
  • Eric Chu;Akanksha Baid;Ting Chen;AnHai Doan;Jeffrey Naughton

  • Affiliations:
  • University of Wisconsin-Madison;University of Wisconsin-Madison;University of Wisconsin-Madison;University of Wisconsin-Madison;University of Wisconsin-Madison

  • Venue:
  • VLDB '07 Proceedings of the 33rd international conference on Very large data bases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is a growing consensus that it is desirable to query over the structure implicit in unstructured documents, and that ideally this capability should be provided incrementally. However, there is no consensus about what kind of system should be used to support this kind of incremental capability. We explore using a relational system as the basis for a workbench for extracting and querying structure from unstructured data. As a proof of concept, we applied our relational approach to support structured queries over Wikipedia. We show that the data set is always available for some form of querying, and that as it is processed, users can pose a richer set of structured queries. We also provide examples of how we can incrementally evolve our understanding of the data in the context of the relational workbench.