The partitioned-layer index: Answering monotone top-k queries using the convex skyline and partitioning-merging technique

  • Authors:
  • Jun-Seok Heo;Kyu-Young Whang;Min-Soo Kim;Yi-Reun Kim;Il-Yeol Song

  • Affiliations:
  • Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, South Korea;Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, South Korea;Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, South Korea;Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, South Korea;College of Information Science and Technology, Drexel University, Philadelphia, PA 19104, USA

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 0.07

Visualization

Abstract

A top-k query returns k tuples with the highest (or the lowest) scores from a relation. The score is computed by combining the values of one or more attributes. We focus on top-k queries having monotone linear score functions. Layer-based methods are well-known techniques for top-k query processing. These methods construct a database as a single list of layers. Here, the ith layer has the tuples that can be the top-i tuple. Thus, these methods answer top-k queries by reading at most k layers. Query performance, however, is poor when the number of tuples in each layer (simply, the layer size) is large. In this paper, we propose a new layer-ordering method, called the Partitioned-Layer Index (simply, the PL Index), that significantly improves query performance by reducing the layer size. The PL Index uses the notion of partitioning, which constructs a database as multiple sublayer lists instead of a single layer list subsequently reducing the layer size. The PL Index also uses the convex skyline, which is a subset of the skyline, to construct a sublayer to further reduce the layer size. The PL Index has the following desired properties. The query performance of the PL Index is quite insensitive to the weights of attributes (called the preference vector) of the score function and is approximately linear in the value of k. The PL Index is capable of tuning query performance for the most frequently used value of k by controlling the number of sublayer lists. Experimental results using synthetic and real data sets show that the query performance of the PL Index significantly outperforms existing methods except for small values of k (say, k=