A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces

  • Authors:
  • Zhongtian He;Jun Hong;David A. Bell

  • Affiliations:
  • School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK BT7 1NN;School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK BT7 1NN;School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK BT7 1NN

  • Venue:
  • BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Schema matching is a crucial step in data integration. Many approaches to schema matching have been proposed. These approaches make use of different types of information about schemas, including structures, linguistic features and data types etc, to measure different types of similarity between the attributes of two schemas. They then combine different types of similarity and use combined similarity to select a collection of attribute correspondences for every source attribute. Thresholds are usually used for filtering out likely incorrect attribute correspondences, which have to be set manually and are matcher and domain dependent. A selection strategy is also used to resolve any conflicts between attribute correspondences of different source attributes. In this paper, we propose a new prioritized collective selection strategy that has two distinct characteristics. First, this strategy clusters a set of attribute correspondences into a number of clusters and collectively selects attribute correspondences from each of these clusters in a prioritized order. Second, it introduces use of a null correspondence for each source attribute, which represents the option that the source attribute has no attribute correspondence. By considering this option, our strategy does not need a threshold to filter out likely incorrect attribute correspondences. Our experimental results show that our approach is highly effective.