Breaking the deadlock: simultaneously discovering attribute matching and cluster matching with multi-objective simulated annealing

Authors:
Haishan Liu;Dejing Dou
Affiliations:
Computer and Information Science Department, University of Oregon, Eugene;Computer and Information Science Department, University of Oregon, Eugene
Venue:
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
Year:
2011

Citing 9
Cited 0

A Theory of Attributed Equivalence in Databases with Application to Schema Integration

IEEE Transactions on Software Engineering
Similarity measures in scientometric research: the Jaccard index versus Salton's cosine formula

Information Processing and Management: an International Journal
A Tool for Integrating Conceptual Schemas and User Views

Proceedings of the Fourth International Conference on Data Engineering
Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study

PPSN V Proceedings of the 5th International Conference on Parallel Problem Solving from Nature
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A framework to support automated classification and labeling of brain electromagnetic patterns

Computational Intelligence and Neuroscience - Regular issue
A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings

Data Mining and Knowledge Discovery
Ontology-Based mining of brainwaves: a sequence similarity technique for mapping alternative features in event-related potentials (ERP) data

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a data mining approach to challenges in the matching and integration of heterogeneous datasets. In particular, we propose solutions to two problems that arise in combining information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric-typed summary features ("attributes") that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective simulated annealing algorithm is described to find the optimal solution. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.