Supporting early pruning in top-k query processing on massive data

  • Authors:
  • Xixian Han;Jianzhong Li;Donghua Yang

  • Affiliations:
  • School of Computer Science and Technology, Harbin Institute of Technology, 92 XiDaZhi Street, Harbin, China;School of Computer Science and Technology, Harbin Institute of Technology, 92 XiDaZhi Street, Harbin, China;The Academy of Fundamental and Interdisciplinary Sciences, Harbin Institute of Technology, 92 XiDaZhi Street, Harbin, China

  • Venue:
  • Information Processing Letters
  • Year:
  • 2011

Quantified Score

Hi-index 0.89

Visualization

Abstract

This paper analyzes the execution behavior of ''No Random Accesses'' (NRA) and determines the depths to which each sorted file is scanned in growing phase and shrinking phase of NRA respectively. The analysis shows that NRA needs to maintain a large quantity of candidate tuples in growing phase on massive data. Based on the analysis, this paper proposes a novel top-k algorithm Top-K with Early Pruning (TKEP) which performs early pruning in growing phase. General rule and mathematical analysis for early pruning are presented in this paper. The theoretical analysis shows that early pruning can prune most of the candidate tuples. Although TKEP is an approximate method to obtain the top-k result, the probability for correctness is extremely high. Extensive experiments show that TKEP has a significant advantage over NRA.