Instance-based attribute identification in database integration

Authors:
Cecil Eng H. Chua;Roger H. L. Chiang;Ee-Peng Lim
Affiliations:
J. Mack Robinson College of Business, Georgia State University;College of Business Administration, University of Cincinnati;School of Computer Engineering, Nanyang Technological University
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2003

Citing 20
Cited 8

A Theory of Attributed Equivalence in Databases with Application to Schema Integration

IEEE Transactions on Software Engineering
Classifying Schematic and Data Heterogeneity in Multidatabase Systems

Computer
Determining relationships among names in heterogeneous databases

ACM SIGMOD Record
Multivariate data analysis (4th ed.): with readings

Multivariate data analysis (4th ed.): with readings
A framework for the design and evaluation of reverse engineering methods for relational databases

Data & Knowledge Engineering
Semantic integration of conceptual schemas

Data & Knowledge Engineering - Special issue natural language for data bases
The Carnot Heterogeneous Database Project: Implemented Applications

Distributed and Parallel Databases
Schema coordination in federated database management: a comparison with schema integration

Decision Support Systems
Multidatabase query processing with uncertainty in global keys and attribute values

Journal of the American Society for Information Science - Special issue: management of imprecision and uncertainty
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Conceptual schema analysis: techniques and applications

ACM Transactions on Database Systems (TODS)
A Probabilistic Decision Model for Entity Matching in Heterogeneous Databases

Management Science
Tuple source relational model: a source-aware data model for multidatabases

Data & Knowledge Engineering
SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
Finding candidate keys for relational data bases

SIGMOD '75 Proceedings of the 1975 ACM SIGMOD international conference on Management of data
Entity Identification in Database Integration

Proceedings of the Ninth International Conference on Data Engineering
SNOUT: An Intelligent Assistant for Exploratory Data Anaylsis

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Semantic Integration in Heterogeneous Databases Using Neural Networks

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Heuristic Method for Correlating Attribute Group Pairs in Data Mining

ER '98 Proceedings of the Workshops on Data Warehousing and Data Mining: Advances in Database Technologies
A Schema Analysis and Reconciliation Tool Environment for Heterogeneous Databases

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications

Schema Matching Using Duplicates

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
An Exploratory Study of Database Integration Processes

IEEE Transactions on Knowledge and Data Engineering
Theories of meaning in schema matching: An exploratory study

Information Systems
An instance-based approach for domain-independent schema matching

Proceedings of the 46th Annual Southeast Regional Conference on XX
Contextual factors in database integration: a Delphi study

ER'10 Proceedings of the 29th international conference on Conceptual modeling
Using cognitive principles to guide classification in information systems modeling

MIS Quarterly
Instance-Based matching of large ontologies using locality-sensitive hashing

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information

Journal of Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract.Most research on attribute identification in database integration has focused on integrating attributes using schema and summary information derived from the attribute values. No research has attempted to fully explore the use of attribute values to perform attribute identification. We propose an attribute identification method that employs schema and summary instance information as well as properties of attributes derived from their instances. Unlike other attribute identification methods that match only single attributes, our method matches attribute groups for integration. Because our attribute identification method fully explores data instances, it can identify corresponding attributes to be integrated even when schema information is misleading. Three experiments were performed to validate our attribute identification method. In the first experiment, the heuristic rules derived for attribute classification were evaluated on 119 attributes from nine public domain data sets. The second was a controlled experiment validating the robustness of the proposed attribute identification method by introducing erroneous data. The third experiment evaluated the proposed attribute identification method on five data sets extracted from online music stores. The results demonstrated the viability of the proposed method.