Efficient top-k processing over query-dependent functions

Authors:
Lin Guo;Sihem Amer Yahia;Raghu Ramakrishnan;Jayavel Shanmugasundaram;Utkarsh Srivastava;Erik Vee
Affiliations:
Yahoo! Research;Yahoo! Research;Yahoo! Research;Yahoo! Research;Yahoo! Research;Yahoo! Research
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 16
Cited 1

An O(n) algorithm for the linear multiple choice knapsack problem and related problems

Information Processing Letters
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Least expected cost query optimization: an exercise in utility

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Multidimensional binary search trees used for associative searching

Communications of the ACM
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Filtering algorithms and implementation for very fast publish/subscribe systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
PREFER: a system for the efficient execution of multi-parametric ranked queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Active Database Systems: Triggers and Rules for Advanced Database Processing

Active Database Systems: Triggers and Rules for Advanced Database Processing
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Probabilistic Optimization of Top N Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On Quality of Service Optimization with Discrete QoS Options

RTAS '99 Proceedings of the Fifth IEEE Real-Time Technology and Applications Symposium
RankSQL: query algebra and optimization for relational top-k queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Solving the multidimensional multiple-choice knapsack problem by constructing convex hulls

Computers and Operations Research

Database-support for continuous prediction queries over streaming data

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the efficient evaluation of top-k queries over data items, where the score of each item is dynamically computed by applying an item-specific function whose parameter value is specified in the query. For example, online retail stores rank items by price, which may be a function of the quantity being queried: "Stay 3 nights, get a 15% discount on double-bed rooms." Similarly, while ranking possible routes in online maps by predicted congestion level, the score (congestion) is a function of the time being queried, e.g., "At 5PM on a Friday in Palo Alto, the congestion level on 101 North is high." Since the parameter---the number of nights or the time the online map is queried, in the above examples---is only known at query time, and online applications have stringent response-time requirements, it is infeasible to evaluate every item-specific function to determine the item scores, especially when the number of items is large. Further, space considerations make it infeasible to pre-compute and store the score of each item for each value of the input parameter. In this paper, we develop a novel technique that compresses the (large) set of item scores for all parameter values by dividing the parameter range into intervals, taking into account the expected query workload. This compressed representation is then used to do top-k pruning of query results. Our experiments show that the proposed techniques are scalable and efficient.