A novel clustering-based approach to schema matching

Authors:
Jin Pei;Jun Hong;David Bell
Affiliations:
School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK;School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK;School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK
Venue:
ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
Year:
2006

Citing 12
Cited 3

Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Statistical schema matching across web query interfaces

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Stability-based validation of clustering solutions

Neural Computation
An interactive clustering-based approach to integrating source query interfaces on the deep Web

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Corpus-Based Schema Matching

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Resampling Method for Unsupervised Estimation of Cluster Validity

Neural Computation
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Wise-integrator: an automatic integrator of web search interfaces for E-commerce

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

A schema matching-based approach to XML schema clustering

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Instance-based domain ontological view creation towards semantic integration

Expert Systems with Applications: An International Journal
Towards a More Scalable Schema Matching: A Novel Approach

International Journal of Distributed Systems and Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Schema matching is a critical step in data integration from multiple heterogeneous data sources. This paper presents a new approach to schema matching, based on two observations. First, it is easier to find attribute correspondences between those schemas that are contextually similar. Second, the attribute correspondences found between these schemas can be used to help find new attribute correspondences between other schemas. Motivated by these observations, we propose a novel clustering-based approach to schema matching. First, we cluster schemas on the basis of their contextual similarity. Second, we cluster attributes of the schemas that are in the same schema cluster to find attribute correspondences between these schemas. Third, we cluster attributes across different schema clusters using statistical information gleaned from the existing attribute clusters to find attribute correspondences between more schemas. We leverage a fast clustering algorithm, the K-Means algorithm, to the above three clustering tasks. We have evaluated our approach in the context of integrating information from multiple web interfaces and the results show the effectiveness of our approach.