Improving XML schema matching performance using Prüfer sequences
Data & Knowledge Engineering
HAMSTER: using search clicklogs for schema and taxonomy matching
Proceedings of the VLDB Endowment
Structural and semantic aspects of similarity of Document Type Definitions and XML schemas
Information Sciences: an International Journal
Top-k generation of mediated schemas over multiple data sources
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Automatic multi-schema integration based on user preference
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Leveraging query logs for schema mapping generation in U-MAP
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
U-MAP: a system for usage-based schema matching and mapping
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Discovering implicit categorical semantics for schema matching
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Appearance-Order-Based schema matching
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Pay-as-You-Go ranking of schema mappings using query logs
DILS'12 Proceedings of the 8th international conference on Data Integration in the Life Sciences
Hi-index | 0.00 |
Existing techniques for schema matching are classified as either schema-based, instance-based, or a combination of both. In this paper, we define a new class of techniques, called usage-based schema matching. The idea is to exploit information extracted from the query logs to find correspondences between attributes in the schemas to be matched. We propose methods to identify co-occurrence patterns between attributes in addition to other features such as their use in joins and with aggregate functions. Several scoring functions are considered to measure the similarity of the extracted features, and a genetic algorithm is employed to find the highest-score mappings between the two schemas. Our technique is suitable for matching schemas even when their attribute names are opaque. It can further be combined with existing techniques to obtain more accurate results. Our experimental study demonstrates the effectiveness of the proposed approach and the benefit of combining it with other existing approaches.