Optimizing top-k queries for middleware access: A unified cost-based approach

Authors:
Seung-won Hwang;Kevin Chen-chuan Chang
Affiliations:
Pohang University of Science and Technology, Gyungbuk, Korea;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2007

Citing 13
Cited 11

Predicate migration: optimizing queries with expensive predicates

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Reducing the Braking Distance of an SQL Query Engine

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Probabilistic Optimization of Top N Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
On Real-Time Top k Querying for Mobile Services

On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
Towards Efficient Multi-Feature Queries in Heterogeneous Environments

ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
Evaluating Top-k Queries over Web-Accessible Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Processing top-N relational queries by learning

Journal of Intelligent Information Systems
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
A new approach for processing ranked subsequence matching based on ranked union

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient and generic evaluation of ranked queries

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
EcoTop: an economic model for dynamic processing of top-k queries in mobile-P2P networks

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Complex pattern ranking (CPR): evaluating top-k pattern queries over event streams

Proceedings of the 5th ACM international conference on Distributed event-based system
A general top-k algorithm for web data sources

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Scalable entity matching computation with materialization

Proceedings of the 20th ACM international conference on Information and knowledge management
Subspace top-k query processing using the hybrid-layer index with a tight bound

Data & Knowledge Engineering
Efficient entity matching using materialized lists

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article studies optimizing top-k queries in middlewares. While many assorted algorithms have been proposed, none is generally applicable to a wide range of possible scenarios. Existing algorithms lack both the “generality” to support a wide range of access scenarios and the systematic “adaptivity” to account for runtime specifics. To fulfill this critical lacking, we aim at taking a cost-based optimization approach: By runtime search over a space of algorithms, cost-based optimization is general across a wide range of access scenarios, yet adaptive to the specific access costs at runtime. While such optimization has been taken for granted for relational queries from early on, it has been clearly lacking for ranked queries. In this article, we thus identify and address the barriers of realizing such a unified framework. As the first barrier, we need to define a “comprehensive” space encompassing all possibly optimal algorithms to search over. As the second barrier and a conflicting goal, such a space should also be “focused” enough to enable efficient search. For SQL queries that are explicitly composed of relational operators, such a space, by definition, consists of schedules of relational operators (or “query plans”). In contrast, top-k queries do not have logical tasks, such as relational operators. We thus define the logical tasks of top-k queries as building blocks to identify a comprehensive and focused space for top-k queries. We then develop efficient search schemes over such space for identifying the optimal algorithm. Our study indicates that our framework not only unifies, but also outperforms existing algorithms specifically designed for their scenarios.