Estimating recall and precision for vague queries in databases

Authors:
Raquel Kolitski Stasiu;Carlos A. Heuser;Roberto da Silva
Affiliations:
Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil;Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil;Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
Venue:
CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
Year:
2005

Citing 28
Cited 10

VAGUE: a user interface to relational databases that permits vague queries

ACM Transactions on Information Systems (TOIS)
A probabilistic relational data model

EDBT '90 Proceedings of the 2nd international conference on extending database technology: Advances in Database Technology
A probabilistic relational model for the integration of IR and databases

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic relational model and algebra

ACM Transactions on Database Systems (TODS)
A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Algebras for querying text regions: expressive power and optimization

Journal of Computer and System Sciences - Fourteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Similarity Measures

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data integration using similarity joins and a word-based information representation language

ACM Transactions on Information Systems (TOIS)
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering Algorithms

Clustering Algorithms
Modern Information Retrieval

Modern Information Retrieval
Database Systems: The Complete Book

Database Systems: The Complete Book
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Approximate String Joins in a Database (Almost) for Free

Proceedings of the 27th International Conference on Very Large Data Bases
Text joins in an RDBMS for web data integration

WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient similarity search and classification via rank aggregation

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Integrating similarity based retrieval and query refinement in databases

Integrating similarity based retrieval and query refinement in databases
Efficient evaluation of relevance feedback for multidimensional all-pairs retrieval

Proceedings of the 2003 ACM symposium on Applied computing
Answering imprecise database queries: a novel approach

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Evaluating Refined Queries in Top-k Retrieval Systems

IEEE Transactions on Knowledge and Data Engineering
Efficient similarity-based operations for data integration

Data & Knowledge Engineering
A Possible World Approach to Uncertain Relational Data

DEXA '04 Proceedings of the Database and Expert Systems Applications, 15th International Workshop
Measuring similarity between collection of values

Proceedings of the 6th annual ACM international workshop on Web information and data management
Using similarity-based operations for resolving data-level conflicts

BNCOD'03 Proceedings of the 20th British national conference on Databases

XML version detection

Proceedings of the 2007 ACM symposium on Document engineering
A strategy for allowing meaningful and comparable scores in approximate matching

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Uma abordagem efetiva e eficiente para deduplicação de metadados bibliográficos de objetos digitais

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Automatic threshold estimation for data matching applications

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
A strategy for allowing meaningful and comparable scores in approximate matching

Information Systems
A strategy for allowing meaningful and comparable scores in approximate matching

Information Systems
Automatic threshold estimation for data matching applications

Information Sciences: an International Journal
An unsupervised heuristic-based approach for bibliographic metadata deduplication

Information Processing and Management: an International Journal
XML fuzzy ranking

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
On modeling query refinement by capturing user intent through feedback

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124

Quantified Score

Hi-index	0.00

Visualization

Abstract

In vague queries, a user enters a value that represents some real world object and expects as the result the set of database values that represent this real world object even with not exact matching. The problem appears in databases that collect data from different sources or databases were different users enter data directly. Query engines usually rely on the use of some type of similarity metric to support data with inexact matching. The problem of building query engines to execute vague queries has been already studied, but an important problem still remains open, namely that of defining the threshold to be used when a similarity scan is performed over a database column. From the bibliography it is known that the threshold depends on the similarity metrics and also on the set of values being queried. Thus, it is unrealistic to expect that the user supplies a threshold at query time. In this paper we propose a process for estimation of recall/precision values for several thresholds for a database column. The idea is that this process is started by a database administrator in a pre-processing phase using samples extracted from database. The meta-data collected by this process may be used in query processing in the optimization phase. The paper describes this process as well as experiments that were performed in order to evaluate it.