TJJE: An efficient algorithm for top-k join on massive data

Authors:
Xixian Han;Jianzhong Li;Jinbao Wang;Donghua Yang
Affiliations:
School of Computer Science and Technology, Harbin Institute of Technology, China;School of Computer Science and Technology, Harbin Institute of Technology, China;School of Computer Science and Technology, Harbin Institute of Technology, China;The Academy of Fundamental and Interdisciplinary Sciences, Harbin Institute of Technology, China
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 35
Cited 0

A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
Database Architecture Optimized for the New Bottleneck: Memory Access

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Supporting Incremental Join Queries on Ranked Inputs

Proceedings of the 27th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Towards Efficient Multi-Feature Queries in Heterogeneous Environments

ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Probe Minimization by Schedule Optimization: Supporting Top-K Queries with Expensive Predicates

IEEE Transactions on Knowledge and Data Engineering
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
The effect of reading policy on early join result production

Information Sciences: an International Journal
Joining ranked inputs in practice

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Best position algorithms for top-k queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Evaluating rank joins with optimal cost

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Confidence-Aware Join Algorithms

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Robust and efficient algorithms for rank join evaluation

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents

Information Sciences: an International Journal
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Type-level access pattern view: Enhancing prefetching performance using the iterative and recursive patterns

Information Sciences: an International Journal
Finding top-k elements in data streams

Information Sciences: an International Journal
Top-k query evaluation in sensor networks under query response time constraint

Information Sciences: an International Journal
Processing top-k join queries

Proceedings of the VLDB Endowment
Minimal perfect hashing: A competitive method for indexing internal memory

Information Sciences: an International Journal
Supporting early pruning in top-k query processing on massive data

Information Processing Letters
Designing fast architecture-sensitive tree search on modern multicore/many-core processors

ACM Transactions on Database Systems (TODS)
Group skyline computation

Information Sciences: an International Journal
A refactoring method for cache-efficient swarm intelligence algorithms

Information Sciences: an International Journal
Stochastic skylines

ACM Transactions on Database Systems (TODS)
Top-k retrieval for ontology mediated access to relational databases

Information Sciences: an International Journal
Interactive skyline queries

Information Sciences: an International Journal
PI-Join: Efficiently processing join queries on massive data

Knowledge and Information Systems

Quantified Score

Hi-index	0.07

Visualization

Abstract

In many applications, top-k join is an important operation to return the k most important join tuples among the potentially huge answer space according to a given ranking function. PBRJ is an algorithm template that generalizes previous top-k join algorithms. In this paper, our analysis shows that PBRJ needs to maintain a large quantity of candidate tuples on massive data. Based on the analysis, this paper proposes a novel top-k join algorithm TJJE which is suitable for handling massive data. By some pre-computed information, TJJE first estimates an upper-bound on scan depth of each joined table. Then it determines the file that contains the join positional index pairs of the top-k join results. A novel algorithm is proposed to retrieve the required join tuples by a single sequential and selective scan on the joined tables. Finally, the top-k join results are obtained by a single scan on the retrieved join tuples. The correctness proof and cost analysis of TJJE are presented in this paper. Extensive experiments show that TJJE maintains up to three orders of magnitude fewer candidate tuples and obtains up to one order of magnitude speedup compared to PBRJ.