Supporting early pruning in top-k query processing on massive data

Authors:
Xixian Han;Jianzhong Li;Donghua Yang
Affiliations:
School of Computer Science and Technology, Harbin Institute of Technology, 92 XiDaZhi Street, Harbin, China;School of Computer Science and Technology, Harbin Institute of Technology, 92 XiDaZhi Street, Harbin, China;The Academy of Fundamental and Interdisciplinary Sciences, Harbin Institute of Technology, 92 XiDaZhi Street, Harbin, China
Venue:
Information Processing Letters
Year:
2011

Citing 15
Cited 2

A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient similarity search and classification via rank aggregation

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Towards Efficient Multi-Feature Queries in Heterogeneous Environments

ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
Comparison of neural networks and discriminant analysis in predicting forest cover types

Comparison of neural networks and discriminant analysis in predicting forest cover types
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Scalable Bloom Filters

Information Processing Letters
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
A practical approach for efficiently answering top-k relational queries

Decision Support Systems
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Best position algorithms for top-k queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
On the false-positive rate of Bloom filters

Information Processing Letters

TJJE: An efficient algorithm for top-k join on massive data

Information Sciences: an International Journal
Range query estimation with data skewness for top-k retrieval

Decision Support Systems

Quantified Score

Hi-index	0.89

Visualization

Abstract

This paper analyzes the execution behavior of ''No Random Accesses'' (NRA) and determines the depths to which each sorted file is scanned in growing phase and shrinking phase of NRA respectively. The analysis shows that NRA needs to maintain a large quantity of candidate tuples in growing phase on massive data. Based on the analysis, this paper proposes a novel top-k algorithm Top-K with Early Pruning (TKEP) which performs early pruning in growing phase. General rule and mathematical analysis for early pruning are presented in this paper. The theoretical analysis shows that early pruning can prune most of the candidate tuples. Although TKEP is an approximate method to obtain the top-k result, the probability for correctness is extremely high. Extensive experiments show that TKEP has a significant advantage over NRA.