Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting duplicate objects in XML documents
Proceedings of the 2004 international workshop on Information quality in information systems
Fast discovery of connection subgraphs
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploiting relationships for object consolidation
Proceedings of the 2nd international workshop on Information quality in information systems
Machine Learning
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Improving Grouped-Entity Resolution Using Quasi-Cliques
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Entity Resolution with Markov Logic
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hi-index | 0.00 |
Deduplication refers to the task of finding instances that refer to the same entity in a given table. Several techniques have been presented based on a pairwise comparison and a typical result is the definition of three sets of records i) pairwise records that definitively match, ii) pairwise records that definitively do not match, and iii) pairwise records that possibly match. In this paper we present a general approach for domain independent duplicate problems by means of the knowledge stored in the schema where the analyzed table is included. According to the different kinds of relationships, we propose strategies to build and compare the knowledge networks by means of graph-based similarity. Final similarity decision given different relationship categories is carried out by exploiting two probabilistic logic models.