Schema Discovery of the Semi-structured and Hierarchical Data

  • Authors:
  • Jianwen He

  • Affiliations:
  • -

  • Venue:
  • IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web data are typically Semi-structured data and lack explicit external schema information, which makes querying and browsing the web data inefficient. In this paper, we present an approach to discover the inherent schema(s) in semi-structured, hierarchical data sources fast and efficiently, based on OEM model and efficient pruning strategy. The schema discovered by our algorithm is a kind of data path expressions and can be transformed into schema tree easily.