The Threshold Algorithm: From Middleware Systems to the Relational Engine

Authors:
Nicolas Bruno;Hui (Wendy) Wang
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2007

Citing 16
Cited 5

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
R × W: a scheduling approach for large-scale on-demand data broadcast

IEEE/ACM Transactions on Networking (TON)
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Using Fagin's Algorithm for Merging Ranked Results in Multimedia Middleware

COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
Query Processing Issues in Image(Multimedia) Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
A Sampling-Based Estimator for Top-k Query

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
Rank-aware query optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Making the Threshold Algorithm Access Cost Aware

IEEE Transactions on Knowledge and Data Engineering

Semantics and evaluation of top-k queries in probabilistic databases

Distributed and Parallel Databases
Supporting ranking pattern-based aggregate queries in sequence data cubes

Proceedings of the 18th ACM conference on Information and knowledge management
Multiple feature fusion for social media applications

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient and generic evaluation of ranked queries

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

The answer to a top-k query is an ordered set of tuples, where the ordering is based on how closely each tuple matches the query. In the context of middleware systems, new algorithms to answer top-k queries have been recently proposed. Among these, the Threshold Algorithm (TA) is the most well-known instance due to its simplicity and memory requirements. TA is based on an early-termination condition and can evaluate top-k queries without examining all the tuples. This top-k query model is prevalent not only over middleware systems, but also over plain relational data. In this work, we analyze the challenges that must be addressed to adapt TA to a relational database system. We show that, depending on the available indices, many alternative TA strategies can be used to answer a given query. Choosing the best alternative requires a cost model that can be seamlessly integrated with that of current optimizers. In this work, we address these challenges and conduct an extensive experimental evaluation of the resulting techniques by characterizing which scenarios can take advantage of TA-like algorithms to answer top-k queries in relational database systems.