On saying “Enough already!” in SQL
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
PREFER: a system for the efficient execution of multi-parametric ranked queries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Evaluating Top-k Selection Queries
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Reducing network traffic in unstructured P2P systems using Top-k queries
Distributed and Parallel Databases
Adaptive rank-aware query optimization in relational databases
ACM Transactions on Database Systems (TODS)
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient top-k processing in large-scaled distributed environments
Data & Knowledge Engineering
Angle-based space partitioning for efficient parallel skyline computation
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On efficient top-k query processing in highly distributed environments
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Efficient B-tree based indexing for cloud data processing
Proceedings of the VLDB Endowment
RanKloud: Scalable Multimedia Data Processing in Server Clusters
IEEE MultiMedia
Efficient distributed top-k query processing with caching
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
CoHadoop: flexible data placement and its exploitation in Hadoop
Proceedings of the VLDB Endowment
Processing of Rank Joins in Highly Distributed Systems
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Hi-index | 0.00 |
The MapReduce framework for parallel processing of massive data sets has attracted considerable attention recently, mainly due to its salient features that include scalability, simplicity, and fault-tolerance. However, despite its merits, MapReduce follows a brute-force approach, which often results in performing redundant work. This is particularly evident in the case of rank-aware queries, such as top-k, where a bounded set of k tuples comprise the result set. To process such queries in MapReduce, the input data needs to be accessed in its entirety, in order to produce the correct result set. To address this limitation of lack of early termination, in this paper, we investigate on different techniques that allow efficient processing of rank-aware queries, without accessing the input data exhaustively. We present various individual approaches that can be combined and demonstrate their advantages and shortcomings. Thus, we provide the first steps towards integrating efficient rank-aware processing in MapReduce.