Multiple relationship based deduplication

Authors:
Pei Li
Affiliations:
University of Milan, Milan, Italy
Venue:
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Year:
2010

Citing 13
Cited 0

Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting duplicate objects in XML documents

Proceedings of the 2004 international workshop on Information quality in information systems
Fast discovery of connection subgraphs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploiting relationships for object consolidation

Proceedings of the 2nd international workshop on Information quality in information systems
Markov logic networks

Machine Learning
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Improving Grouped-Entity Resolution Using Quasi-Cliques

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Entity Resolution with Markov Logic

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Swoosh: a generic approach to entity resolution

The VLDB Journal — The International Journal on Very Large Data Bases
Recursive random fields

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Deduplication refers to the task of finding instances that refer to the same entity in a given table. Several techniques have been presented based on a pairwise comparison and a typical result is the definition of three sets of records i) pairwise records that definitively match, ii) pairwise records that definitively do not match, and iii) pairwise records that possibly match. In this paper we present a general approach for domain independent duplicate problems by means of the knowledge stored in the schema where the analyzed table is included. According to the different kinds of relationships, we propose strategies to build and compare the knowledge networks by means of graph-based similarity. Final similarity decision given different relationship categories is carried out by exploiting two probabilistic logic models.