A strategy for allowing meaningful and comparable scores in approximate matching

Authors:
Carina F. Dorneles;Carlos A. Heuser;Viviane Moreira Orengo;Altigran S. da Silva;Edleno S. de Moura
Affiliations:
UFRGS, Porto Alegre, Brazil;UFRGS, Porto Alegre, Brazil;UFRGS, Porto Alegre, Brazil;UFAM, Manaus, Brazil;UFAM, Manaus, Brazil
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 16
Cited 2

VAGUE: a user interface to relational databases that permits vague queries

ACM Transactions on Information Systems (TOIS)
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
Modern Information Retrieval

Modern Information Retrieval
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Text joins in an RDBMS for web data integration

WWW '03 Proceedings of the 12th international conference on World Wide Web
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Measuring similarity between collection of values

Proceedings of the 6th annual ACM international workshop on Web information and data management
Reasoning About Approximate Match Query Results

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Profile-Based Object Matching for Information Integration

IEEE Intelligent Systems
Merging the results of approximate match operations

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
FASE: A Framework for Scalable Performance Prediction of HPC Systems and Applications

Simulation
Estimating recall and precision for vague queries in databases

CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering

Uma abordagem efetiva e eficiente para deduplicação de metadados bibliográficos de objetos digitais

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
XML: some papers in a haystack

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of approximate data matching is to assess whether two distinct data instances represent the same real world object. This is usually achieved through the use of a similarity function, which returns a score that defines how similar two data instances are. If this score surpasses a given threshold, both data instances are considered as representing the same real world object. The score values returned by a similarity function depend on the algorithm that implements the function and have no meaning to the user (apart from the fact that a higher similarity value means that two data instances are more similar). In this paper, we propose that instead of defining the threshold in terms of the scores returned by a similarity function, the user specifies the precision that is expected from the matching process. Precision is a well known quality measure and has a clear interpretation from the user's point of view. Our approach relies on mapping between similarity scores and precision values based on a training data set. Experimental results show the training may be executed against a representative data set, and reused for other databases from the same domain.