Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
An efficient agglomerative clustering algorithm using a heap
Pattern Recognition
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
XIRQL: An XML query language based on information retrieval concepts
ACM Transactions on Information Systems (TOIS)
Entropy metric for XML DTD documents
ACM SIGSOFT Software Engineering Notes
An Entropy-Based Characterization of the Heterogeneity of XML Collections
DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Structural similarity evaluation between XML documents and DTDs
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
TwigTable: using semantics in XML twig pattern query processing
Journal on data semantics XV
Hi-index | 0.00 |
XML is semi-structured. It can be used to annotate unstructured data, to represent structured data and almost anything in-between. Yet, it is unclear how to formally characterize, yet to quantify, structured-ness of XML. In this paper we propose and evaluate entropy-based metrics for XML structured-ness. The metrics measure the structural uniformity of path and subtrees, respectively. We empirically study the correlation of these metrics with real and synthetic data sets.