A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces

Authors:
Zhongtian He;Jun Hong;David A. Bell
Affiliations:
School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK BT7 1NN;School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK BT7 1NN;School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK BT7 1NN
Venue:
BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Year:
2009

Citing 18
Cited 0

Approximate String Matching

ACM Computing Surveys (CSUR)
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Statistical schema matching across web query interfaces

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An interactive clustering-based approach to integrating source query interfaces on the deep Web

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Schema Matching Using Duplicates

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Matching Theory (North-Holland mathematics studies)

Matching Theory (North-Holland mathematics studies)
Automatic complex schema matching across Web query interfaces: A correlation mining approach

ACM Transactions on Database Systems (TODS)
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Instance-based schema matching for web databases by domain-specific query probing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Schema Matching across Query Interfaces on the Deep Web

BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Corpus-based knowledge representation

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A survey of schema-based matching approaches

Journal on Data Semantics IV
Managing uncertainty in schema matching with top-k schema mappings

Journal on Data Semantics VI

Quantified Score

Hi-index	0.00

Visualization

Abstract

Schema matching is a crucial step in data integration. Many approaches to schema matching have been proposed. These approaches make use of different types of information about schemas, including structures, linguistic features and data types etc, to measure different types of similarity between the attributes of two schemas. They then combine different types of similarity and use combined similarity to select a collection of attribute correspondences for every source attribute. Thresholds are usually used for filtering out likely incorrect attribute correspondences, which have to be set manually and are matcher and domain dependent. A selection strategy is also used to resolve any conflicts between attribute correspondences of different source attributes. In this paper, we propose a new prioritized collective selection strategy that has two distinct characteristics. First, this strategy clusters a set of attribute correspondences into a number of clusters and collectively selects attribute correspondences from each of these clusters in a prioritized order. Second, it introduces use of a null correspondence for each source attribute, which represents the option that the source attribute has no attribute correspondence. By considering this option, our strategy does not need a threshold to filter out likely incorrect attribute correspondences. Our experimental results show that our approach is highly effective.