A comparative analysis of methodologies for database schema integration
ACM Computing Surveys (CSUR)
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data-driven understanding and refinement of schema mappings
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Alternative Interest Measures for Mining Associations in Databases
IEEE Transactions on Knowledge and Data Engineering
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
CoMine: Efficient Mining of Correlated Patterns
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Clustering structured web sources: a schema-based, model-differentiation approach
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Making holistic schema matching robust: an ensemble approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Overview and Framework for Data and Information Quality Research
Journal of Data and Information Quality (JDIQ)
iZi: a new toolkit for pattern mining problems
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Hi-index | 0.00 |
To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sourcess. As a new attempt, this paper studies such matching as a data mining problem. Specifically, while complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this paper takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this "deep Web," query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach, which consists of dual mining of positive and negative correlations. We evaluate our approach on deep Web sources in several object domains (e.g., Books and Airfares) and the results show that the correlation mining approach does discover semantically meaningful matchings among attributes.