Efficient top-k processing over query-dependent functions

  • Authors:
  • Lin Guo;Sihem Amer Yahia;Raghu Ramakrishnan;Jayavel Shanmugasundaram;Utkarsh Srivastava;Erik Vee

  • Affiliations:
  • Yahoo! Research;Yahoo! Research;Yahoo! Research;Yahoo! Research;Yahoo! Research;Yahoo! Research

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the efficient evaluation of top-k queries over data items, where the score of each item is dynamically computed by applying an item-specific function whose parameter value is specified in the query. For example, online retail stores rank items by price, which may be a function of the quantity being queried: "Stay 3 nights, get a 15% discount on double-bed rooms." Similarly, while ranking possible routes in online maps by predicted congestion level, the score (congestion) is a function of the time being queried, e.g., "At 5PM on a Friday in Palo Alto, the congestion level on 101 North is high." Since the parameter---the number of nights or the time the online map is queried, in the above examples---is only known at query time, and online applications have stringent response-time requirements, it is infeasible to evaluate every item-specific function to determine the item scores, especially when the number of items is large. Further, space considerations make it infeasible to pre-compute and store the score of each item for each value of the input parameter. In this paper, we develop a novel technique that compresses the (large) set of item scores for all parameter values by dividing the parameter range into intervals, taking into account the expected query workload. This compressed representation is then used to do top-k pruning of query results. Our experiments show that the proposed techniques are scalable and efficient.