VAGUE: a user interface to relational databases that permits vague queries
ACM Transactions on Information Systems (TOIS)
A probabilistic relational data model
EDBT '90 Proceedings of the 2nd international conference on extending database technology: Advances in Database Technology
A probabilistic relational model for the integration of IR and databases
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic relational model and algebra
ACM Transactions on Database Systems (TODS)
A probabilistic relational algebra for the integration of information retrieval and database systems
ACM Transactions on Information Systems (TOIS)
ProbView: a flexible probabilistic database system
ACM Transactions on Database Systems (TODS)
An overview of query optimization in relational systems
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Algebras for querying text regions: expressive power and optimization
Journal of Computer and System Sciences - Fourteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
XIRQL: a query language for information retrieval in XML documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering Algorithms
Modern Information Retrieval
Database Systems: The Complete Book
Database Systems: The Complete Book
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient similarity search and classification via rank aggregation
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Integrating similarity based retrieval and query refinement in databases
Integrating similarity based retrieval and query refinement in databases
Efficient evaluation of relevance feedback for multidimensional all-pairs retrieval
Proceedings of the 2003 ACM symposium on Applied computing
Answering imprecise database queries: a novel approach
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Evaluating Refined Queries in Top-k Retrieval Systems
IEEE Transactions on Knowledge and Data Engineering
Efficient similarity-based operations for data integration
Data & Knowledge Engineering
A Possible World Approach to Uncertain Relational Data
DEXA '04 Proceedings of the Database and Expert Systems Applications, 15th International Workshop
Measuring similarity between collection of values
Proceedings of the 6th annual ACM international workshop on Web information and data management
Using similarity-based operations for resolving data-level conflicts
BNCOD'03 Proceedings of the 20th British national conference on Databases
Proceedings of the 2007 ACM symposium on Document engineering
A strategy for allowing meaningful and comparable scores in approximate matching
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Uma abordagem efetiva e eficiente para deduplicação de metadados bibliográficos de objetos digitais
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Automatic threshold estimation for data matching applications
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
A strategy for allowing meaningful and comparable scores in approximate matching
Information Systems
A strategy for allowing meaningful and comparable scores in approximate matching
Information Systems
Automatic threshold estimation for data matching applications
Information Sciences: an International Journal
An unsupervised heuristic-based approach for bibliographic metadata deduplication
Information Processing and Management: an International Journal
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
On modeling query refinement by capturing user intent through feedback
ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Hi-index | 0.00 |
In vague queries, a user enters a value that represents some real world object and expects as the result the set of database values that represent this real world object even with not exact matching. The problem appears in databases that collect data from different sources or databases were different users enter data directly. Query engines usually rely on the use of some type of similarity metric to support data with inexact matching. The problem of building query engines to execute vague queries has been already studied, but an important problem still remains open, namely that of defining the threshold to be used when a similarity scan is performed over a database column. From the bibliography it is known that the threshold depends on the similarity metrics and also on the set of values being queried. Thus, it is unrealistic to expect that the user supplies a threshold at query time. In this paper we propose a process for estimation of recall/precision values for several thresholds for a database column. The idea is that this process is started by a database administrator in a pre-processing phase using samples extracted from database. The meta-data collected by this process may be used in query processing in the optimization phase. The paper describes this process as well as experiments that were performed in order to evaluate it.