XML document similarity measure in terms of the structure and contents

Authors:
Woosaeng Kim
Affiliations:
Department of Computer Science, Kwangwoon University, Nowon-Gu, Seoul, Korea
Venue:
CEA'08 Proceedings of the 2nd WSEAS International Conference on Computer Engineering and Applications
Year:
2008

Citing 6
Cited 1

Approximate tree matching in the presence of variable length don't cares

Journal of Algorithms
Meaningful change detection in structured data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
Content and Structure Based Approach For XML Similarity

CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology

Wavelets families and similarity metrics analysis in VIR system design

WSEAS Transactions on Information Science and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML has become the standard for data representation and exchange on the Internet. With a large number of XML documents on the Web, there is an increasing need to automatically process those structurally rich documents for information retrieval, similarity clustering, and search applications. In this paper, we propose a new method to measure the similarity between XML documents by considering their structures and contents. The similarity of document's structure is found by simple string matching technique and that of document's contents is found by weights taking into account of the names and positions of elements.