Approximate joins: concepts and techniques

Authors:
Nick Koudas;Divesh Srivastava
Affiliations:
University of Toronto;AT&T Labs-Research
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 1
Cited 8

Data quality and data cleaning: an overview

Proceedings of the 2003 ACM SIGMOD international conference on Management of data

Using SPIDER: an experience report

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Record linkage: similarity measures and algorithms

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Randomized algorithms for data reconciliation in wide area aggregate query processing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Brokering infrastructure for minimum cost data procurement based on quality-quantity models

Decision Support Systems
Time-completeness trade-offs in record linkage using adaptive query processing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
idMesh: graph-based disambiguation of linked data

Proceedings of the 18th international conference on World wide web
Fast locality-sensitive hashing

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Topological operators: a relaxed query processing approach

Geoinformatica

Quantified Score

Hi-index	0.00

Visualization

Abstract

The quality of the data residing in information repositories and databases gets degraded due to a multitude of reasons. Such reasons include typing mistakes during insertion (e.g., character transpositions), lack of standards for recording database fields (e.g., addresses), and various errors introduced by poor database design (e.g., missing integrity constraints). Data of poor quality can result in significant impediments to popular business practices: sending products or bills to incorrect addresses, inability to locate customer records during service calls, inability to correlate customers across multiple services, etc.