Data Exchange: Semantics and Query Answering
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Schema Mapping as Query Discovery
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Putting context into schema matching
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Multi-column substring matching for database schema translation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data integration with uncertainty
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Bootstrapping pay-as-you-go data integration systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Top-k generation of integrated schemas based on directed and weighted correspondences
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
Attribute-level schema matching is a critical step in numerous database applications, such as DataSpaces, Ontology Merging and Schema Integration. There exist many researches on this topic, however, they ignore the implicit categorical information which is crucial to find high-quality matches between schema attributes. In this paper, we discover the categorical semantics implicit in source instances, and associate them with the matches in order to improve overall quality of schema matching. Our method works in three phases. The first phase is a pre-detecting step that detects the possible categories of source instances by using clustering techniques. In the second phase, we employ information entropy to find the attributes whose instances imply the categorical semantics. In the third phase, we introduce a new concept c-mapping to represent the associations between the matches and the categorical semantics. Then, we employ an adaptive scoring function to evaluate the c-mappings to achieve the task of associating the matches with the semantics. Moreover, we show how to translate the matches with semantics into schema mapping expressions, and use the chase procedure to transform source data into target schemas. An experimental study shows that our approach is effective and has good performance.