Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
C4.5: programs for machine learning
C4.5: programs for machine learning
Automated resolution of semantic heterogeneity in multidatabases
ACM Transactions on Database Systems (TODS)
String searching algorithms
Semantic similarity relations and computation in schema integration
Data & Knowledge Engineering
Identifying object isomerism in multidatabase systems
Distributed and Parallel Databases
The linguistic level: contribution for conceptual design, view integration, reuse and documentation
Data & Knowledge Engineering - Special issue natural language for data bases
Supporting schema integration by linguistic instruments
Data & Knowledge Engineering - Special issue natural language for data bases
Semantic integration of conceptual schemas
Data & Knowledge Engineering - Special issue natural language for data bases
Schema integration: past, present, and future
Management of heterogeneous and autonomous database systems
Managing heterogeneous information systems through discovery and retrieval of generic concepts
Journal of the American Society for Information Science
Data & Knowledge Engineering
Automating the approximate record-matching process
Information Sciences—Informatics and Computer Science: An International Journal
Assessment of cluster analysis and self-organizing maps
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Intensional and extensional integration and abstraction of heterogeneous databases
Data & Knowledge Engineering
Matching records in a national medical patient index
Communications of the ACM
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Discovering and reconciling value conflicts for numerical data integration
Information Systems - Data extraction, cleaning and reconciliation
Computer-Aided Multivariate Analysis
Computer-Aided Multivariate Analysis
Self-Organizing Maps
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Multi-User View Integration System (MUVIS): An Expert System for View Integration
Proceedings of the Sixth International Conference on Data Engineering
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
Reducing Inconsistency in Integrating Data From Different Sources
IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
Asessing Semnatic Similarities among Geospatial Feature Class Definitions
INTEROP '99 Proceedings of the Second International Conference on Interoperating Geographic Information Systems
A Model to Support E-Catalog Integration
Proceedings of the IFIP TC2/WG2.6 Ninth Working Conference on Database Semantics: Semantic Issues in E-Commerce Systems
Automatic Classification of Semantic Concepts in View Specifications
DEXA '96 Proceedings of the 7th International Conference on Database and Expert Systems Applications
Semantic Based Schema Analysis
DEXA '98 Proceedings of the 9th International Conference on Database and Expert Systems Applications
On Using Historical Update Information for Instance Identification in Federated Databases
COOPIS '96 Proceedings of the First IFCIS International Conference on Cooperative Information Systems
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Cluster Analysis
Modeling and manipulating the structure of hierarchical schemas for the web
Information Sciences: an International Journal
Data & Knowledge Engineering
Automatic Methods for Integrating Biomedical Data Sources in a Mediator-Based System
DILS '08 Proceedings of the 5th international workshop on Data Integration in the Life Sciences
Collective taxonomizing: A collaborative approach to organizing document repositories
Decision Support Systems
Clustering and visualizing SOM results
IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Editorial: Revising the constraints of lightweight mediated schemas
Data & Knowledge Engineering
Linear combination of component results in information retrieval
Data & Knowledge Engineering
FedDW global schema architect: UML-based design tool for the integration of data mart schemas
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information
Journal of Database Management
Hi-index | 0.00 |
Determining the correspondences among heterogeneous data sources, which is critical to integration of the data sources, is a complex and resource-consuming task that demands automated support. We propose an iterative procedure for detecting both schema-level and instance-level correspondences from heterogeneous data sources. Cluster analysis techniques are used first to identify similar schema elements (i.e., relations and attributes). Based on the identified schema-level correspondences, classification techniques are used to identify matching tuples. Statistical analysis techniques are then applied to a preliminary integrated data set to evaluate the relationships among schema elements more accurately. Improvement in schema-level correspondences triggers another iteration of an iterative procedure. We have performed empirical evaluation using real-world heterogeneous data sources and report in this paper some promising results (i.e., incremental improvement in identified correspondences) that demonstrate the utility of the proposed iterative procedure.