Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
The nature of statistical learning theory
The nature of statistical learning theory
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Joint deduplication of multiple record types in relational data
Proceedings of the 14th ACM international conference on Information and knowledge management
Collective object identification
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Collaborative management of web ontology data with flexible access control
Expert Systems with Applications: An International Journal
Creating and managing ontology data on the web: a semantic wiki approach
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Hi-index | 0.00 |
This paper proposes an approach to detect duplicates among relational data. Traditional methods for record linkage or duplicate detection work on a set of records which have no explicit relations with each other. These records can be formatted into a single database table for processing. However, there are situations that records from different sources can not be flattened into one table and records within one source have certain (semantic) relations between them. The duplicate detection issue of these relational data records/instances can be dealt with by formatting them into several tables and applying traditional methods to each table. However, as the relations among the original data records are ignored, this approach generates poor or inconsistent results. This paper analyzes the characteristics of relational data and proposes a particular clustering approach to perform duplicate detection. This approach incorporates constraint rules derived from the characteristics of relational data and therefore yields better and more consistent results, which are revealed by our experiments.