Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Ontology Matching
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
RiMOM: A Dynamic Multistrategy Ontology Alignment Framework
IEEE Transactions on Knowledge and Data Engineering
DBpedia - A crystallization point for the Web of Data
Web Semantics: Science, Services and Agents on the World Wide Web
Ontology matching with semantic verification
Web Semantics: Science, Services and Agents on the World Wide Web
A framework for semantic link discovery over relational data
Proceedings of the 18th ACM conference on Information and knowledge management
Discovering and Maintaining Links on the Web of Data
ISWC '09 Proceedings of the 8th International Semantic Web Conference
LinkedGeoData: Adding a Spatial Dimension to the Web of Data
ISWC '09 Proceedings of the 8th International Semantic Web Conference
Parallel FP-growth on PC cluster
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
An empirical study of instance-based ontology matching
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Asymmetric and context-dependent semantic similarity among ontology instances
Journal on data semantics X
An Introduction to Duplicate Detection
An Introduction to Duplicate Detection
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Linking and building ontologies of linked data
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
A self-training approach for resolving object coreference on the semantic web
Proceedings of the 20th international conference on World wide web
evaluating the stability and credibility of ontology matching methods
ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Leveraging terminological structure for object reconciliation
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
LIMES: a time-efficient approach for large-scale link discovery on the web of data
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Hi-index | 0.00 |
Publishing structured data and linking them to Linking Open Data (LOD) is an ongoing effort to create a Web of data. Each newly involved data source may contain duplicated instances (entities) whose descriptions or schemata differ from those of the existing sources in LOD. To tackle this heterogeneity issue, several matching methods have been developed to link equivalent entities together. Many general-purpose matching methods which focus on similarity metrics suffer from very diverse matching results for different data source pairs. On the other hand, the dataset-specific ones leverage heuristic rules or even manual efforts to ensure the quality, which makes it impossible to apply them to other sources or domains. In this paper, we offer a third choice, a general method of automatically discovering dataset-specific matching rules. In particular, we propose a semi-supervised learning algorithm to iteratively refine matching rules and find new matches of high confidence based on these rules. This dramatically relieves the burden on users of defining rules but still gives high-quality matching results. We carry out experiments on real-world large scale data sources in LOD; the results show the effectiveness of our approach in terms of the precision of discovered matches and the number of missing matches found. Furthermore, we discuss several extensions (like similarity embedded rules, class restriction and SPARQL rewriting) to fit various applications with different requirements.