A strategy for allowing meaningful and comparable scores in approximate matching

  • Authors:
  • Carina F. Dorneles;Carlos A. Heuser;Viviane Moreira Orengo;Altigran S. da Silva;Edleno S. de Moura

  • Affiliations:
  • UFRGS, Porto Alegre, Brazil;UFRGS, Porto Alegre, Brazil;UFRGS, Porto Alegre, Brazil;UFAM, Manaus, Brazil;UFAM, Manaus, Brazil

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of approximate data matching is to assess whether two distinct data instances represent the same real world object. This is usually achieved through the use of a similarity function, which returns a score that defines how similar two data instances are. If this score surpasses a given threshold, both data instances are considered as representing the same real world object. The score values returned by a similarity function depend on the algorithm that implements the function and have no meaning to the user (apart from the fact that a higher similarity value means that two data instances are more similar). In this paper, we propose that instead of defining the threshold in terms of the scores returned by a similarity function, the user specifies the precision that is expected from the matching process. Precision is a well known quality measure and has a clear interpretation from the user's point of view. Our approach relies on mapping between similarity scores and precision values based on a training data set. Experimental results show the training may be executed against a representative data set, and reused for other databases from the same domain.