Processing ranked queries with the minimum space

  • Authors:
  • Yufei Tao;Marios Hadjieleftheriou

  • Affiliations:
  • Department of Computer Science, City University of Hong Kong, Hong Kong;Department of Computer Science, Boston University, Boston

  • Venue:
  • FoIKS'06 Proceedings of the 4th international conference on Foundations of Information and Knowledge Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Practical applications often need to rank multi-variate records by assigning various priorities to different attributes. Consider a relation that stores students' grades on two courses: database and algorithm. Student performance is evaluated by an “overall score” calculated as w1 · gdb + w2 · galg, where w1, w2 are two input “weights”, and gdb (galg) is the student grade on database (algorithm). A “top-k ranked query” retrieves the k students with the best scores according to specific w1 and w2. We focus on top-k queries whose k is bounded by a constant c, and present solutions that guarantee low worst-case query cost by using provably the minimum space. The core of our methods is a novel concept, “minimum covering subset”, which contains only the necessary data for ensuring correct answers for all queries. Any 2D ranked search, for example, can be processed in O(logB (m/B) + c/B) I/Os using O(m/B) space, where m is the size of the minimum covering subset, and B the disk page capacity. Similar results are also derived for higher dimensionalities and approximate ranked retrieval.