An effective rule miner for instance matching in a web of data

Authors:
Xing Niu;Shu Rong;Haofen Wang;Yong Yu
Affiliations:
Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 21
Cited 0

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Ontology Matching

Ontology Matching
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
RiMOM: A Dynamic Multistrategy Ontology Alignment Framework

IEEE Transactions on Knowledge and Data Engineering
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
Ontology matching with semantic verification

Web Semantics: Science, Services and Agents on the World Wide Web
A framework for semantic link discovery over relational data

Proceedings of the 18th ACM conference on Information and knowledge management
Discovering and Maintaining Links on the Web of Data

ISWC '09 Proceedings of the 8th International Semantic Web Conference
LinkedGeoData: Adding a Spatial Dimension to the Web of Data

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Parallel FP-growth on PC cluster

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
An empirical study of instance-based ontology matching

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Asymmetric and context-dependent semantic similarity among ontology instances

Journal on data semantics X
An Introduction to Duplicate Detection

An Introduction to Duplicate Detection
SameAs networks and beyond: analyzing deployment status and implications of owl:sameAs in linked data

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Linking and building ontologies of linked data

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
A self-training approach for resolving object coreference on the semantic web

Proceedings of the 20th international conference on World wide web
evaluating the stability and credibility of ontology matching methods

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Leveraging terminological structure for object reconciliation

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
LIMES: a time-efficient approach for large-scale link discovery on the web of data

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Quantified Score

Hi-index	0.00

Visualization

Abstract

Publishing structured data and linking them to Linking Open Data (LOD) is an ongoing effort to create a Web of data. Each newly involved data source may contain duplicated instances (entities) whose descriptions or schemata differ from those of the existing sources in LOD. To tackle this heterogeneity issue, several matching methods have been developed to link equivalent entities together. Many general-purpose matching methods which focus on similarity metrics suffer from very diverse matching results for different data source pairs. On the other hand, the dataset-specific ones leverage heuristic rules or even manual efforts to ensure the quality, which makes it impossible to apply them to other sources or domains. In this paper, we offer a third choice, a general method of automatically discovering dataset-specific matching rules. In particular, we propose a semi-supervised learning algorithm to iteratively refine matching rules and find new matches of high confidence based on these rules. This dramatically relieves the burden on users of defining rules but still gives high-quality matching results. We carry out experiments on real-world large scale data sources in LOD; the results show the effectiveness of our approach in terms of the precision of discovered matches and the number of missing matches found. Furthermore, we discuss several extensions (like similarity embedded rules, class restriction and SPARQL rewriting) to fit various applications with different requirements.