A Path-sequence Based Discrimination for Subtree Matching in Approximate XML Joins

Authors:
Wenxin Liang;Haruo Yokota
Affiliations:
Tokyo Institute of Technology, Japan;Tokyo Institute of Technology, Japan
Venue:
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Year:
2006

Citing 0
Cited 6

XML Data Integration Based on Content and Structure Similarity Using Keys

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
A system for detecting xml similarity in content and structure using relational database

Proceedings of the 18th ACM conference on Information and knowledge management
XML-SIM: Structure and Content Semantic Similarity Detection Using Keys

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
An approach for XML similarity join using tree serialization

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
XML-SIM-CHANGE: structure and content semantic similarity detection among XML document versions

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
Proximity search of XML data using ontology and XPath edit similarity

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we discuss the one-to-multiple matching problem in leaf-clustering based approximate XML join algorithms and propose a path-sequence based discrimination method to solve this problem. In our method, each path sequence from the top node to the matched leaf in the base and target subtree is extracted, and the most similar target subtree for the base one is determined by the pathsequence based subtree similarity degree. We conduct experiments to evaluate our method by using both real bibliography and bioinformatics XML documents. The experimental results show that our method can effectively decrease the occunence rate of one-to-multiple matching for both bibliography and bioinformatics XML data, and hence improve the precision of the leaf-clustering based approximate XML join algorithms.