Evaluating indeterministic duplicate detection results

Authors:
Fabian Panse;Norbert Ritter
Affiliations:
University of Hamburg, Hamburg, Germany;University of Hamburg, Hamburg, Germany
Venue:
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Year:
2012

Citing 10
Cited 0

Consistent query answers in inconsistent databases

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Quality Measures in Uncertain Data Management

SUM '07 Proceedings of the 1st international conference on Scalable Uncertainty Management
Modeling and querying possible repairs in duplicate detection

Proceedings of the VLDB Endowment
Framework for evaluating clustering algorithms in duplicate detection

Proceedings of the VLDB Endowment
An Introduction to Duplicate Detection

An Introduction to Duplicate Detection
Evaluating entity resolution results

Proceedings of the VLDB Endowment
On-the-fly entity-aware query processing in the presence of linkage

Proceedings of the VLDB Endowment
Entity Resolution and Information Quality

Entity Resolution and Information Quality
Probabilistic Databases

Probabilistic Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Duplicate detection is an important process for cleaning or integrating data. Since real-life data is often polluted, detecting duplicates usually comes along with uncertainty. To handle duplicate uncertainty in an appropriate way, indeterministic duplicate detection approaches, i.e. approaches in which ambiguous duplicate decisions are probabilistically modeled in the resultant data, have been developed. To rate the goodness of a duplicate detection approach, its detection results need to be evaluated in their quality. In this paper, we propose several semantics to apply traditional quality evaluation measures to indeterministic duplicate detection results and exemplarily present an efficient evaluation for one of these semantics. Finally, we present some experimental results.