Computing structural similarity of source XML schemas against domain XML schema

Authors:
Jianxin Li;Jixue Liu;Chengfei Liu;Guoren Wang;Jeffrey Xu Yu;Chi Yangt
Affiliations:
Swinburne University of Technology, Australia;University of South Australia, Australia;Swinburne University of Technology, Australia;Northeastern University, China;Chinese University of Hong Kong, China;Swinburne University of Technology, Australia
Venue:
ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Year:
2008

Citing 17
Cited 1

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Semantic integration of semistructured and structured data sources

ACM SIGMOD Record
SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Maintaining order in a linked list

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
Detecting Changes in XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications

Information Systems - Special issue on web data integration
Similarity evaluation on tree-structured data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
XML application schema matching using similarity measure and relaxation labeling

Information Sciences: an International Journal
Efficient structural joins on indexed XML documents

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Mapping adaptation under evolving schemas

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Similarity of XML-Schema Elements

The Computer Journal

Clustering XML documents by structure

ADBIS'09 Proceedings of the 13th East European conference on Advances in Databases and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the problem of measuring structural similarities of large number of source schemas against a single domain schema, which is useful for enhancing the quality of searching and ranking big volume of source documents on the Web with the help of structural information. After analyzing the improperness of adopting existing edit-distance based methods, we propose a new similarity measure model that caters for the requirements of the problem. Given the asymmetric nature of the similarity comparisons of source schemas with a domain schema, similarity preserving rules and algorithm are designed to filter out uninteresting elements in source schemas for the purpose of optimizing the similarity computation. Based on the model, a basic algorithm and an improved algorithm are developed for structural similarity computation. The improved algorithm makes full use of a new coding scheme that is devised to reduce the number of comparisons. Complexities of both algorithms are analyzed and extensive experiments are conducted showing the significant performance gain achieved by the improved algorithm.