On saying "enough already!" in MapReduce

Authors:
Christos Doulkeridis;Kjetil Nørvåg
Affiliations:
Norwegian University of Science and Technology, Sem Sælandsvei, Trondheim, Norway;Norwegian University of Science and Technology, Sem Sælandsvei, Trondheim, Norway
Venue:
Proceedings of the 1st International Workshop on Cloud Intelligence
Year:
2012

Citing 20
Cited 0

On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximate medians and other quantiles in one pass and with limited memory

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
PREFER: a system for the efficient execution of multi-parametric ranked queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Reducing network traffic in unstructured P2P systems using Top-k queries

Distributed and Parallel Databases
Adaptive rank-aware query optimization in relational databases

ACM Transactions on Database Systems (TODS)
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient top-k processing in large-scaled distributed environments

Data & Knowledge Engineering
Angle-based space partitioning for efficient parallel skyline computation

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On efficient top-k query processing in highly distributed environments

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment
Efficient B-tree based indexing for cloud data processing

Proceedings of the VLDB Endowment
RanKloud: Scalable Multimedia Data Processing in Server Clusters

IEEE MultiMedia
Efficient distributed top-k query processing with caching

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
CoHadoop: flexible data placement and its exploitation in Hadoop

Proceedings of the VLDB Endowment
Processing of Rank Joins in Highly Distributed Systems

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The MapReduce framework for parallel processing of massive data sets has attracted considerable attention recently, mainly due to its salient features that include scalability, simplicity, and fault-tolerance. However, despite its merits, MapReduce follows a brute-force approach, which often results in performing redundant work. This is particularly evident in the case of rank-aware queries, such as top-k, where a bounded set of k tuples comprise the result set. To process such queries in MapReduce, the input data needs to be accessed in its entirety, in order to produce the correct result set. To address this limitation of lack of early termination, in this paper, we investigate on different techniques that allow efficient processing of rank-aware queries, without accessing the input data exhaustively. We present various individual approaches that can be combined and demonstrate their advantages and shortcomings. Thus, we provide the first steps towards integrating efficient rank-aware processing in MapReduce.