Learning approach for domain-independent linked data instance matching

Authors:
Khai Nguyen;Ryutaro Ichise;Hoai-Bac Le
Affiliations:
University of Science, Ho Chi Minh, Vietnam;National Institute of Informatics, Tokyo, Japan;University of Science, Ho Chi Minh, Vietnam
Venue:
Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
Year:
2012

Citing 4
Cited 0

Graph-based ranking algorithms for sentence extraction, applied to text summarization

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Discovering and Maintaining Links on the Web of Data

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Automatically generating data linkages using a domain-independent candidate selection approach

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because almost all linked data sources are currently published by different providers, interlinking homogeneous instances of these sources is an important problem in data integration. Recently, instance matching has been used to identify owl: sameAs links between linked datasets. Previous approaches primarily use predefined maps of corresponding attributes and most of them are limited to matching in specific domains. In this paper, we propose the LFM, a learning-based instant matching system, which is designed for achieving a reliable domain-independent matcher. First, we compute the similarity vectors between labeled pairs of instances without specifying the meaning of the RDF predicates. Then a learning process is applied to learn a tree classifier for predicting whether the new pairs of instances are identical. Experiments demonstrate that our method achieves a 4% improvement in precision and recall against recent top-ranked matchers, if we use a small amount of labeled data for learning.