A Sampling-Based Estimator for Top-k Query

Authors:
Affiliations:
Venue:
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Year:
2002

Citing 0
Cited 20

Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Finding global icebergs over distributed data sets

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuous monitoring of top-k queries over sliding windows

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Genetic algorithms for approximate similarity queries

Data & Knowledge Engineering
The Threshold Algorithm: From Middleware Systems to the Relational Engine

IEEE Transactions on Knowledge and Data Engineering
A practical approach for efficiently answering top-k relational queries

Decision Support Systems
Region clustering based evaluation of multiple top-N selection queries

Data & Knowledge Engineering
Computing Relaxed Answers on RDF Databases

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Processing top-N relational queries by learning

Journal of Intelligent Information Systems
Adaptive relaxation for querying heterogeneous XML data sources

Information Systems
Efficient top-k search across heterogeneous XML data sources

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
MTopS: scalable processing of continuous top-k multi-query workloads

Proceedings of the 20th ACM international conference on Information and knowledge management
Approximating query answering on RDF databases

World Wide Web
Evaluating mid-(k, n) queries using b+-tree

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Distributed top-k query processing by exploiting skyline summaries

Distributed and Parallel Databases
Range query estimation with data skewness for top-k retrieval

Decision Support Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Top-k queries arise naturally in many database applications that require searching for records whose attribute values are close to those specified in a query. In this paper, we study the problem of processing a top-k query by translating it into an approximate range query that can be efficiently processed by traditional relational DBMSs. We propose a sampling-based approach, along with various query mapping strategies, to determine a range query that yields high recall with low access cost.Our experiments on real-world datasets show that, given the same memory budgets, our sampling-based estimator outperforms a previous histogram-based method in terms of access cost, while achieving the same level of recall. Furthermore, unlike the histogram-based approach, our sampling-based query mapping scheme scales well for high-dimensional data and is easy to implement with low maintenance cost.