Mining complex matchings across Web query interfaces

Authors:
Bin He;Kevin Chen-Chuan Chang;Jiawei Han
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Year:
2004

Citing 15
Cited 3

A comparative analysis of methodologies for database schema integration

ACM Computing Surveys (CSUR)
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Probe, count, and classify: categorizing hidden web databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data-driven understanding and refinement of schema mappings

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical schema matching across web query interfaces

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
CoMine: Efficient Mining of Correlated Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Understanding Web query interfaces: best-effort parsing with hidden syntax

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Clustering structured web sources: a schema-based, model-differentiation approach

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology

Making holistic schema matching robust: an ensemble approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Overview and Framework for Data and Information Quality Research

Journal of Data and Information Quality (JDIQ)
iZi: a new toolkit for pattern mining problems

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sourcess. As a new attempt, this paper studies such matching as a data mining problem. Specifically, while complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this paper takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this "deep Web," query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach, which consists of dual mining of positive and negative correlations. We evaluate our approach on deep Web sources in several object domains (e.g., Books and Airfares) and the results show that the correlation mining approach does discover semantically meaningful matchings among attributes.