Measuring XML structured-ness with entropy

  • Authors:
  • Ruiming Tang;Huayu Wu;Stéphane Bressan

  • Affiliations:
  • School of Computing, National University of Singapore, Singapore;School of Computing, National University of Singapore, Singapore;Center for Maritime Studies, Singapore

  • Venue:
  • WAIM'11 Proceedings of the 2011 international conference on Web-Age Information Management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML is semi-structured. It can be used to annotate unstructured data, to represent structured data and almost anything in-between. Yet, it is unclear how to formally characterize, yet to quantify, structured-ness of XML. In this paper we propose and evaluate entropy-based metrics for XML structured-ness. The metrics measure the structural uniformity of path and subtrees, respectively. We empirically study the correlation of these metrics with real and synthetic data sets.