Efficient distributed top-k query processing with caching

Authors:
Norvald H. Ryeng;Akrivi Vlachou;Christos Doulkeridis;Kjetil Nørvåg
Affiliations:
Norwegian University of Science and Technology, Department of Computer and Information Science, Trondheim, Norway;Norwegian University of Science and Technology, Department of Computer and Information Science, Trondheim, Norway;Norwegian University of Science and Technology, Department of Computer and Information Science, Trondheim, Norway;Norwegian University of Science and Technology, Department of Computer and Information Science, Trondheim, Norway
Venue:
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Year:
2011

Citing 20
Cited 3

PREFER: a system for the efficient execution of multi-parametric ranked queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
Optimizing Top-k Selection Queries over Multimedia Repositories

IEEE Transactions on Knowledge and Data Engineering
Efficient top-K query calculation in distributed networks

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Reducing network traffic in unstructured P2P systems using Top-k queries

Distributed and Parallel Databases
Answering top-k queries using views

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Query suspend and resume

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient top-k processing in large-scaled distributed environments

Data & Knowledge Engineering
Best position algorithms for top-k queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Stop-and-restart style execution for long running decision support queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
On efficient top-k query processing in highly distributed environments

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
PROQID: partial restarts of queries in distributed databases

Proceedings of the 17th ACM conference on Information and knowledge management
Efficient and Robust Database Support for Data-Intensive Applications in Dynamic Environments

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
View usability and safety for the answering of top-k queries via materialized views

Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP

Distributed top-k query processing by exploiting skyline summaries

Distributed and Parallel Databases
On saying "enough already!" in MapReduce

Proceedings of the 1st International Workshop on Cloud Intelligence
Efficient top-k query answering using cached views

Proceedings of the 16th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, there has been an increased interest in incorporating in database management systems rank-aware query operators, such as top-k queries, that allow users to retrieve only the most interesting data objects. In this paper, we propose a cache-based approach for efficiently supporting top-k queries in distributed database management systems. In large distributed systems, the query performance depends mainly on the network cost, measured as the number of tuples transmitted over the network. Ideally, only the k tuples that belong to the query result set should be transmitted. Nevertheless, a server cannot decide based only on its local data which tuples belong to the result set. Therefore, in this paper, we use caching of previous results to reduce the number of tuples that must be fetched over the network. To this end, our approach always delivers as many tuples as possible from cache and constructs a remainder query to fetch the remaining tuples. This is different from the existing distributed approaches that need to re-execute the entire top-k query when the cached entries are not sufficient to provide the result set. We demonstrate the feasibility and efficiency of our approach through implementation in a distributed database management system.