Citing 6
Cited 6

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
AJAX: an extensible data cleaning tool

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A knowledge-based approach for duplicate elimination in data cleaning

Information Systems - Data extraction, cleaning and reconciliation
Potter's Wheel: An Interactive Data Cleaning System

Proceedings of the 27th International Conference on Very Large Data Bases
Attribute-Oriented Induction Using Domain Generalization Graphs

ICTAI '96 Proceedings of the 8th International Conference on Tools with Artificial Intelligence

Exploiting relationships for object consolidation

Proceedings of the 2nd international workshop on Information quality in information systems
Domain-independent data cleaning via analysis of entity-relationship graph

ACM Transactions on Database Systems (TODS)
An effective approach to entity resolution problem using quasi-clique and its application to digital libraries
Survey on test collections and techniques for personal name matching

International Journal of Metadata, Semantics and Ontologies
A graphical method for reference reconciliation

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Analysing social networks within bibliographical data

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data quality problems can arise from abbreviations, data entry mistakes, duplicate records, missing fields, and many other sources. Data-cleaning research has focused on duplicate elimination or the merge/purge problem. Another problem is erroneous data called spurious links, where a real-world entity has multiple record links that might not be properly associated with it. One approach to this problem is to use context information to clean up the spurious links. This approach identifies and retrieves the data containing potential spurious links, then performs a context similarity comparison to determine records with high overlaps. The degree of overlapping context indicates the likelihood of spurious links. Experiments on three real-world data sets demonstrate that this approach can correctly identify spurious links and thus assist data cleaning.

Cleaning the Spurious Links in Data

Quantified Score

Visualization

Abstract