Estimating recall and precision for vague queries in databases

  • Authors:
  • Raquel Kolitski Stasiu;Carlos A. Heuser;Roberto da Silva

  • Affiliations:
  • Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil;Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil;Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil

  • Venue:
  • CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In vague queries, a user enters a value that represents some real world object and expects as the result the set of database values that represent this real world object even with not exact matching. The problem appears in databases that collect data from different sources or databases were different users enter data directly. Query engines usually rely on the use of some type of similarity metric to support data with inexact matching. The problem of building query engines to execute vague queries has been already studied, but an important problem still remains open, namely that of defining the threshold to be used when a similarity scan is performed over a database column. From the bibliography it is known that the threshold depends on the similarity metrics and also on the set of values being queried. Thus, it is unrealistic to expect that the user supplies a threshold at query time. In this paper we propose a process for estimation of recall/precision values for several thresholds for a database column. The idea is that this process is started by a database administrator in a pre-processing phase using samples extracted from database. The meta-data collected by this process may be used in query processing in the optimization phase. The paper describes this process as well as experiments that were performed in order to evaluate it.