Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
A Four Russians algorithm for regular expression pattern matching
Journal of the ACM (JACM)
A subquadratic algorithm for approximate regular expression matching
Journal of Algorithms
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Similarity Measurement of XML Documents Based on Structure and Contents
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Equivalence of XSD Constructs and Its Exploitation in Similarity Evaluation
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
Structural and semantic aspects of similarity of Document Type Definitions and XML schemas
Information Sciences: an International Journal
Intuitionistic fuzzy XML query matching
FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Hi-index | 0.00 |
The use of XML documents in the Internet continues to grow. Need for the analysis of XML documents from heterogeneous sources is arisen, in which documents would conform to different DTDs. In this paper, we propose a measure on the structural similarity among XML documents and DTDs, which is natural to understand and fast to calculate. The measure is defined as a weighted sum of the local measures of document elements with a weighting scheme based on their subtree sizes. While the local measure of an element is defined as its edit distance against its declaration, viewed as regular expression, in the DTD. Based on our definition, an algorithm for edit distance calculation between a string and a regular expression is proposed, which is modified from the algorithm applied in the regular expression matching problem. The advantage of the measure comes with its natural definition and linear complexity.