A bridging model for parallel computation
Communications of the ACM
Scalable parallel geometric algorithms for coarse grained multicomputers
SCG '93 Proceedings of the ninth annual symposium on Computational geometry
On Finding the Maxima of a Set of Vectors
Journal of the ACM (JACM)
Introduction to algorithms
Proceedings of the 17th International Conference on Data Engineering
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
DOULION: counting triangles in massive graphs with a coin
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PLANET: massively parallel learning of tree ensembles with MapReduce
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Proceedings of the 19th international conference on World wide web
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ParaTimer: a progress indicator for MapReduce DAGs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A model of computation for MapReduce
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Energy management for MapReduce clusters
Proceedings of the VLDB Endowment
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Behavioral simulations in MapReduce
Proceedings of the VLDB Endowment
Cheetah: a high performance, custom data warehouse on top of MapReduce
Proceedings of the VLDB Endowment
Counting triangles and the curse of the last reducer
Proceedings of the 20th international conference on World wide web
Automatic optimization for MapReduce programs
Proceedings of the VLDB Endowment
Column-oriented storage techniques for MapReduce
Proceedings of the VLDB Endowment
Social content matching in MapReduce
Proceedings of the VLDB Endowment
Parallel evaluation of conjunctive queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing theta-joins using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Fast personalized PageRank on MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Filtering: a method for solving graph problems in MapReduce
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Adapting skyline computation to the MapReduce framework: algorithms and experiments
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
CoHadoop: flexible data placement and its exploitation in Hadoop
Proceedings of the VLDB Endowment
SystemML: Declarative machine learning on MapReduce
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Optimizing Multiway Joins in a Map-Reduce Environment
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast clustering using MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering very large multi-dimensional datasets with MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Building wavelet histograms on large data in MapReduce
Proceedings of the VLDB Endowment
Densest subgraph in streaming and MapReduce
Proceedings of the VLDB Endowment
ReStore: reusing results of MapReduce jobs
Proceedings of the VLDB Endowment
PerfXplain: debugging MapReduce job performance
Proceedings of the VLDB Endowment
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
Proceedings of the VLDB Endowment
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Adaptive MapReduce using situation-aware mappers
Proceedings of the 15th International Conference on Extending Database Technology
Extending Map-Reduce for Efficient Predicate-Based Sampling
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Load Balancing in MapReduce Based on Scalable Cardinality Estimates
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Load Balancing for MapReduce-based Entity Resolution
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Efficient processing of k nearest neighbor joins using MapReduce
Proceedings of the VLDB Endowment
Early accurate results for advanced analytics on MapReduce
Proceedings of the VLDB Endowment
Efficient multi-way theta-join processing using MapReduce
Proceedings of the VLDB Endowment
Stubby: a transformation-based optimizer for MapReduce workflows
Proceedings of the VLDB Endowment
M3R: increased performance for in-memory Hadoop jobs
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
MapReduce has become a dominant parallel computing paradigm for big data, i.e., colossal datasets at the scale of tera-bytes or higher. Ideally, a MapReduce system should achieve a high degree of load balancing among the participating machines, and minimize the space usage, CPU and I/O time, and network transfer at each machine. Although these principles have guided the development of MapReduce algorithms, limited emphasis has been placed on enforcing serious constraints on the aforementioned metrics simultaneously. This paper presents the notion of minimal algorithm, that is, an algorithm that guarantees the best parallelization in multiple aspects at the same time, up to a small constant factor. We show the existence of elegant minimal algorithms for a set of fundamental database problems, and demonstrate their excellent performance with extensive experiments.