Fourier transform based techniques in efficient retrieval of similar time sequences
Fourier transform based techniques in efficient retrieval of similar time sequences
A Path-sequence Based Discrimination for Subtree Matching in Approximate XML Joins
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
LAX: an efficient approximate XML join based on clustered leaf nodes for XML data integration
BNCOD'05 Proceedings of the 22nd British National conference on Databases: enterprise, Skills and Innovation
XML-SIM: Structure and Content Semantic Similarity Detection Using Keys
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
GRAMS3: an efficient framework for XML structural similarity search
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
XML-SIM-CHANGE: structure and content semantic similarity detection among XML document versions
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
XML data clustering: An overview
ACM Computing Surveys (CSUR)
Style-based similarity search for office XML documents
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.00 |
This paper proposes a technique for approximately matching XML data based on the content and structure by detecting the similarity of subtrees clustered semantically using leaf-node parents . The leaf-node parents are considered as a root of a subtree which is then recursively traversed bottom-up for matching. First, we take advantage of the "key" for matching subtrees which reduces the number of comparisons dramatically. Second, we measure the similarity degree based on data and structures of the two XML documents. The results show that our approach finds much more accurate matches with or without the presence of keys in XML subtrees. Other approaches experience problems with similarity matching thresholds as they either ignore semantic information available or have problems in handling complex XML data.