Minimal MapReduce algorithms

Authors:
Yufei Tao;Wenqing Lin;Xiaokui Xiao
Affiliations:
Chinese University of Hong Kong, Hong Kong, Hong Kong;Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 56
Cited 0

A bridging model for parallel computation

Communications of the ACM
Scalable parallel geometric algorithms for coarse grained multicomputers

SCG '93 Proceedings of the ninth annual symposium on Computational geometry
On Finding the Maxima of a Set of Vectors

Journal of the ACM (JACM)
Introduction to algorithms

Introduction to algorithms
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
Google news personalization: scalable online collaborative filtering

Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
DOULION: counting triangles in massive graphs with a coin

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PLANET: massively parallel learning of tree ensembles with MapReduce

Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Max-cover in map-reduce

Proceedings of the 19th international conference on World wide web
Efficient parallel set-similarity joins using MapReduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ParaTimer: a progress indicator for MapReduce DAGs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A model of computation for MapReduce

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Energy management for MapReduce clusters

Proceedings of the VLDB Endowment
Dremel: interactive analysis of web-scale datasets

Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce

Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment
Behavioral simulations in MapReduce

Proceedings of the VLDB Endowment
Cheetah: a high performance, custom data warehouse on top of MapReduce

Proceedings of the VLDB Endowment
Counting triangles and the curse of the last reducer

Proceedings of the 20th international conference on World wide web
Automatic optimization for MapReduce programs

Proceedings of the VLDB Endowment
Column-oriented storage techniques for MapReduce

Proceedings of the VLDB Endowment
Social content matching in MapReduce

Proceedings of the VLDB Endowment
Parallel evaluation of conjunctive queries

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing theta-joins using MapReduce

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Fast personalized PageRank on MapReduce

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Filtering: a method for solving graph problems in MapReduce

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Adapting skyline computation to the MapReduce framework: algorithms and experiments

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
CoHadoop: flexible data placement and its exploitation in Hadoop

Proceedings of the VLDB Endowment
SystemML: Declarative machine learning on MapReduce

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Optimizing Multiway Joins in a Map-Reduce Environment

IEEE Transactions on Knowledge and Data Engineering
NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast clustering using MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering very large multi-dimensional datasets with MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Building wavelet histograms on large data in MapReduce

Proceedings of the VLDB Endowment
Densest subgraph in streaming and MapReduce

Proceedings of the VLDB Endowment
ReStore: reusing results of MapReduce jobs

Proceedings of the VLDB Endowment
PerfXplain: debugging MapReduce job performance

Proceedings of the VLDB Endowment
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors

Proceedings of the VLDB Endowment
SkewTune: mitigating skew in mapreduce applications

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Adaptive MapReduce using situation-aware mappers

Proceedings of the 15th International Conference on Extending Database Technology
Extending Map-Reduce for Efficient Predicate-Based Sampling

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Fuzzy Joins Using MapReduce

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Load Balancing in MapReduce Based on Scalable Cardinality Estimates

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Load Balancing for MapReduce-based Entity Resolution

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Efficient processing of k nearest neighbor joins using MapReduce

Proceedings of the VLDB Endowment
Early accurate results for advanced analytics on MapReduce

Proceedings of the VLDB Endowment
Efficient multi-way theta-join processing using MapReduce

Proceedings of the VLDB Endowment
Stubby: a transformation-based optimizer for MapReduce workflows

Proceedings of the VLDB Endowment
M3R: increased performance for in-memory Hadoop jobs

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce has become a dominant parallel computing paradigm for big data, i.e., colossal datasets at the scale of tera-bytes or higher. Ideally, a MapReduce system should achieve a high degree of load balancing among the participating machines, and minimize the space usage, CPU and I/O time, and network transfer at each machine. Although these principles have guided the development of MapReduce algorithms, limited emphasis has been placed on enforcing serious constraints on the aforementioned metrics simultaneously. This paper presents the notion of minimal algorithm, that is, an algorithm that guarantees the best parallelization in multiple aspects at the same time, up to a small constant factor. We show the existence of elegant minimal algorithms for a set of fundamental database problems, and demonstrate their excellent performance with extensive experiments.