Clustering-based schema matching of web data for constructing digital library

  • Authors:
  • Hui Song;Fanyuan Ma;Chen Wang

  • Affiliations:
  • Department of Computer Information Technology, Donghua University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The abundant information on the web attracts many researches on reusing the valuable web data in other information applications, for example, digital libraries. Web information published by various contributors in different ways, schema matching is a basic problem for the heterogeneous data sources integration. Web information integration arises new challenges from the following ways: web data are short of intact schema definition; and the schema matching between web data can not be simplified as 1-1 mapping problem. In this paper we propose an algorithm, COSM, to automatic the web data schema matching process. The matching process is transformed into a clustering problem: the data elements clustered into one cluster are viewed as mapping ones. COSM is mainly instance-level matching approach, also combined with a partial name matcher in calculating the elements distance metrics. A pretreatment for data is carried out to give rational distance metrics between elements before clustering step. The experiment of algorithm testing and application (applied in the Chinese folk music digital library construction) proves the algorithm’s efficiency.