Top-k selection queries over relational databases: Mapping strategies and performance evaluation

Authors:
Nicolas Bruno;Surajit Chaudhuri;Luis Gravano
Affiliations:
Columbia University, New York, NY;Microsoft Research, Redmond, WA;Columbia University, New York, NY
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2002

Citing 30
Cited 101

VAGUE: a user interface to relational databases that permits vague queries

ACM Transactions on Information Systems (TOIS)
Random sampling from database files: a survey

SSDBM V Proceedings of the fifth international conference on Statistical and scientific database management
The hB-tree: a multiattribute indexing method with good guaranteed performance

ACM Transactions on Database Systems (TODS)
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Towards an analysis of range query performance in spatial data structures

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Optimizing queries over multimedia repositories

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Relaxing the uniformity and independence assumptions using the concept of fractal dimension

Journal of Computer and System Sciences - Special issue on principles of database systems
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CONTROL: continuous output and navigation technology with refinement on-line

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
PREFER: a system for the efficient execution of multi-parametric ranked queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Reducing the Braking Distance of an SQL Query Engine

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Probabilistic Optimization of Top N Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Sampling-Based Estimator for Top-k Query

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Preference formulas in relational queries

ACM Transactions on Database Systems (TODS)
Group Nearest Neighbor Queries

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
FleXPath: flexible structure and full-text querying for XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Rank-aware query optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Automatic categorization of query results

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Ranked Relations: Query Languages and Query Processing Methods for Multimedia

Multimedia Tools and Applications
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
Approximating the top-m passages in a parallel question answering system

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Adaptive Processing of Top-k Queries in XML

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient Inverted Lists and Query Algorithms for Structured Value Ranking in Update-Intensive Relational Databases

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Personalized Queries under a Generalized Preference Model

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Constrained optimalities in query personalization

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Aggregate nearest neighbor queries in spatial databases

ACM Transactions on Database Systems (TODS)
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Scalable ranking for preference queries

Proceedings of the 14th ACM international conference on Information and knowledge management
Continuous monitoring of top-k queries over sliding windows

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Answering top-k queries using views

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Answering top-k queries with multi-dimensional selections: the ranking cube approach

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient detection of empty-result queries

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Probabilistic information retrieval approach for ranking of database query results

ACM Transactions on Database Systems (TODS)
Adaptive rank-aware query optimization in relational databases

ACM Transactions on Database Systems (TODS)
Branch-and-bound processing of ranked queries

Information Systems
The Threshold Algorithm: From Middleware Systems to the Relational Engine

IEEE Transactions on Knowledge and Data Engineering
Probe Minimization by Schedule Optimization: Supporting Top-K Queries with Expensive Predicates

IEEE Transactions on Knowledge and Data Engineering
Progressive ranking of range aggregates

Data & Knowledge Engineering
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
Efficient top-k processing in large-scaled distributed environments

Data & Knowledge Engineering
A practical approach for efficiently answering top-k relational queries

Decision Support Systems
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Probabilistic ranking of database query results

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A strategy for allowing meaningful and comparable scores in approximate matching

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Anytime measures for top-k algorithms

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Region clustering based evaluation of multiple top-N selection queries

Data & Knowledge Engineering
A rank algebra to support multimedia mining applications

Proceedings of the 8th international workshop on Multimedia data mining: (associated with the ACM SIGKDD 2007)
Probabilistic ranked queries in uncertain databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Top-k/w publish/subscribe: finding k most relevant publications in sliding time window w

Proceedings of the second international conference on Distributed event-based systems
Semantic query Cache using Dynamic Facts (SCDF): a novel approach to efficient information retrieval

International Journal of Metadata, Semantics and Ontologies
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
COOPERATIVE QUERY REWRITING FOR DECISION MAKING SUPPORT AND RECOMMENDER SYSTEMS

Applied Artificial Intelligence
Joining the results of heterogeneous search engines

Information Systems
Top-k Retrieval in Description Logic Programs Under Vagueness for the Semantic Web

SUM '07 Proceedings of the 1st international conference on Scalable Uncertainty Management
Augmenting Data Retrieval with Information Retrieval Techniques by Using Word Similarity

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Computing Relaxed Answers on RDF Databases

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Efficient top-k processing over query-dependent functions

Proceedings of the VLDB Endowment
Anytime measures for top-k algorithms on exact and fuzzy data sets

The VLDB Journal — The International Journal on Very Large Data Bases
Stratified division queries involving ordinal user preferences

Proceedings of the 2009 ACM symposium on Applied Computing
Consistent Top-k Queries over Time

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Answering linear optimization queries with an approximate stream index

Knowledge and Information Systems
Anti-division Queries with Ordinal Layered Preferences

ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Distributed top-k aggregation queries at large

Distributed and Parallel Databases
A strategy for allowing meaningful and comparable scores in approximate matching

Information Systems
A strategy for allowing meaningful and comparable scores in approximate matching

Information Systems
On Three Classes of Division Queries Involving Ordinal Preferences

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Mining significant change patterns in multidimensional spaces

International Journal of Business Intelligence and Data Mining
Location-aware privacy and more: a systems approach using context-aware database management systems

Proceedings of the 2nd SIGSPATIAL ACM GIS 2009 International Workshop on Security and Privacy in GIS and LBS
About Bipolar Division Operators

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Continuous Processing of Preference Queries in Data Streams

SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Processing top-N relational queries by learning

Journal of Intelligent Information Systems
Personalizing queries based on networks of composite preferences

ACM Transactions on Database Systems (TODS)
Towards fuzzy query answering using fuzzy views - a graded-subsumption-based approach

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Adaptive relaxation for querying heterogeneous XML data sources

Information Systems
Efficient top-k search across heterogeneous XML data sources

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Ranking database queries with user feedback: a neural network approach

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Hierarchically organized skew-tolerant histograms for geographic data objects

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Preference reasoning with soft constraints in constraint-based recommender systems

Constraints
Towards approximate SQL: infobright's approach

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Strict and tolerant antidivision queries with ordinal layered preferences

International Journal of Approximate Reasoning
Top-k vectorial aggregation queries in a distributed environment

Journal of Parallel and Distributed Computing
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
A flexible bipolar querying approach with imprecise data and guaranteed results

Fuzzy Sets and Systems
Efficient distributed top-k query processing with caching

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
A preference query model based on a fusion of local orders

ECSQARU'11 Proceedings of the 11th European conference on Symbolic and quantitative approaches to reasoning with uncertainty
On database queries involving inferred fuzzy predicates

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
MTopS: scalable processing of continuous top-k multi-query workloads

Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient construction of histograms for multidimensional data using quad-trees

Decision Support Systems
On three classes of division queries involving ordinal preferences

Journal of Intelligent Information Systems
Top-k skyline: a unified approach

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems
Approximating query answering on RDF databases

World Wide Web
Efficient processing of distributed top-k queries

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Progressive ranking of range aggregates

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Chapter 11: rank-join algorithms for search computing

Search Computing
On possibilistic skyline queries

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
A fuzzy-rule-based approach to the handling of inferred fuzzy predicates in database queries

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Probabilistic query answering over inconsistent databases

Annals of Mathematics and Artificial Intelligence
Distributed top-k query processing by exploiting skyline summaries

Distributed and Parallel Databases
A top-k query answering procedure for fuzzy logic programming

Fuzzy Sets and Systems
Towards fuzzy query-relaxation for RDF

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Mining top-K multidimensional gradients

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Being picky: processing top-k queries with set-defined selections

Proceedings of the 21st ACM international conference on Information and knowledge management
Ranking RDF with provenance via preference aggregation

EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
On a preference query language that handles symbolic scores

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Efficient processing of top-k join queries by attribute domain refinement

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Subspace top-k query processing using the hybrid-layer index with a tight bound

Data & Knowledge Engineering
A data-mining approach to preference-based data ranking founded on contextual information

Information Systems
A social network-based inference model for validating customer profile data

MIS Quarterly
Bulk sorted access for efficient top-k retrieval

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Shortlisting top-K assignments

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Provisional reporting for rank joins

Journal of Intelligent Information Systems
Range query estimation with data skewness for top-k retrieval

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the "top k" tuples that best match the given attribute values. In this paper, we study the advantages and limitations of processing a top-k query by translating it into a single range query that a traditional relational database management system (RDBMS) can process efficiently. In particular, we study how to determine a range query to evaluate a top-k query by exploiting the statistics available to an RDBMS, and the impact of the quality of these statistics on the retrieval efficiency of the resulting scheme. We also report the first experimental evaluation of the mapping strategies over a real RDBMS, namely over Microsoft's SQL Server 7.0. The experiments show that our new techniques are robust and significantly more efficient than previously known strategies requiring at least one sequential scan of the data sets.