Clustering-based schema matching of web data for constructing digital library

Authors:
Hui Song;Fanyuan Ma;Chen Wang
Affiliations:
Department of Computer Information Technology, Donghua University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Venue:
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
Year:
2005

Citing 13
Cited 0

Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Learning to map between ontologies on the semantic web

Proceedings of the 11th international conference on World Wide Web
Digital Libraries and Autonomous Citation Indexing

Computer
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Discovering Direct and Indirect Matches for Schema Elements

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
The Web-DL environment for building digital libraries from the Web

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
On schema matching with opaque column names and data values

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Statistical schema matching across web query interfaces

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Data Extraction and Annotation for Dynamic Web Pages

EEE '04 Proceedings of the 2004 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'04)
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The abundant information on the web attracts many researches on reusing the valuable web data in other information applications, for example, digital libraries. Web information published by various contributors in different ways, schema matching is a basic problem for the heterogeneous data sources integration. Web information integration arises new challenges from the following ways: web data are short of intact schema definition; and the schema matching between web data can not be simplified as 1-1 mapping problem. In this paper we propose an algorithm, COSM, to automatic the web data schema matching process. The matching process is transformed into a clustering problem: the data elements clustered into one cluster are viewed as mapping ones. COSM is mainly instance-level matching approach, also combined with a partial name matcher in calculating the elements distance metrics. A pretreatment for data is carried out to give rational distance metrics between elements before clustering step. The experiment of algorithm testing and application (applied in the Chinese folk music digital library construction) proves the algorithm’s efficiency.