A practical approach for efficiently answering top-k relational queries

Authors:
Anteneh Ayanso;Paulo B. Goes;Kumar Mehta
Affiliations:
Department of Finance, Operations and Information Systems, Brock University, 500 Glenridge Avenue, St. Catharines, ON, Canada L2S 3A1;Department of Operations and Information Management, University of Connecticut, 2100 Hillside Road, U-1041IM, Storrs, CT 06269, USA;Decision Science and Management Information Systems, School of Management, George Mason University, 4400 University Drive MSN 5F4, Fairfax, VA 22030, USA
Venue:
Decision Support Systems
Year:
2007

Citing 26
Cited 4

Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
The hB-tree: a multiattribute indexing method with good guaranteed performance

ACM Transactions on Database Systems (TODS)
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A Simple Algorithm for Nearest Neighbor Search in High Dimensions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Relaxing the uniformity and independence assumptions using the concept of fractal dimension

Journal of Computer and System Sciences - Special issue on principles of database systems
Modern Information Retrieval

Modern Information Retrieval
Database Management Systems

Database Management Systems
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space

IEEE Transactions on Knowledge and Data Engineering
Reducing the Braking Distance of an SQL Query Engine

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Probabilistic Optimization of Top N Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Sampling-Based Estimator for Top-k Query

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
A distance function to support optimized selection decisions

Decision Support Systems
An efficient, robust method for processing of partial top-k/bottom-k queries using the RD-Tree in OLAP

Decision Support Systems
Beyond keyword and cue-phrase matching: a sentence-based abstraction technique for information extraction

Decision Support Systems
On linear mixture of expert approaches to information retrieval

Decision Support Systems
Content-based object organization for efficient image retrieval in image databases

Decision Support Systems

A graphical shopping interface based on product attributes

Decision Support Systems
Supporting early pruning in top-k query processing on massive data

Information Processing Letters
Efficient construction of histograms for multidimensional data using quad-trees

Decision Support Systems
Range query estimation with data skewness for top-k retrieval

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

An increasing number of application areas now rely on obtaining the ''best matches'' to a given query as opposed to exact matches sought by traditional transactions. This type of exploratory querying (also called top-k querying) can significantly improve the performance of web-based applications such as consumer reviews, price comparisons and recommendations for products/services. Due to the lack of support for specialized indexes and/or data structures in relational database management systems (RDBMSs), recent research has focused on utilizing summary statistics (histograms) maintained by RDBMSs for translating the top-k request into a traditional range query. Because the RDBMS query engines are already optimized for execution of range queries, such approach has both practical as well as efficiency advantages. In this paper, we review the strengths and weaknesses of common histogram construction techniques with regard to their structural characteristics, accuracy in approximating the true distribution of the underlying data, and implications for top-k retrieval. We also present our top-k retrieval strategy (Query-Level Optimal Cost Strategy - QLOCS) and demonstrate its ''histogram-independent'' performance. Based on comparative experimental and statistical analyses with the best-known histogram-based strategy in the literature, we show that QLOCS is not only more efficient but also provides more consistent performance across commonly used histogram types in RDBMSs.