Multi-level Schema Extraction for Heterogenous Semi-structured Data

  • Authors:
  • Jong P. Yoon;Vijay Raghavan

  • Affiliations:
  • -;-

  • Venue:
  • WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Heterogeneous information sources are organized in various different degrees from well-structured data, to unstructured and semi-structured data. Such information sources do not have rigid schema available in advance or even if each source has its own schema, there are no enforced modeling constraints or formats for data across information sources. In this paper, we propose a novel method for abstracting schemas for heterogeneous information sources. At the most detailed level, information sources are represented in a labeled directed graph. We develop several abstraction operations for label generalization and aggregation. One of more of these operations can be applied to a labeled directed graph to "levelize" schemas. Each such level of the schemas is a potentially useful paradigm for query formation and optimization.