A fast and simple randomized parallel algorithm for maximal matching
Information Processing Letters
An Õ(n2) algorithm for minimum cuts
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
A randomized linear-time algorithm for finding minimum spanning trees
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Global min-cuts in RNC, and other ramifications of a simple min-out algorithm
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
A minimum spanning tree algorithm with inverse-Ackermann type complexity
Journal of the ACM (JACM)
Graphs over time: densification laws, shrinking diameters and possible explanations
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On graph problems in a semi-streaming model
Theoretical Computer Science - Automata, languages and programming: Algorithms and complexity (ICALP-A 2004)
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Graph sparsification by effective resistances
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Hadoop: The Definitive Guide
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
A model of computation for MapReduce
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Everyone's an influencer: quantifying influence on twitter
Proceedings of the fourth ACM international conference on Web search and data mining
Analyzing graph structure via linear measurements
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Densest subgraph in streaming and MapReduce
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Space-round tradeoffs for MapReduce computations
Proceedings of the 26th ACM international conference on Supercomputing
CC-MR --- finding connected components in huge graphs with mapreduce
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process
International Journal of Intelligent Systems
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Fast greedy algorithms in mapreduce and streaming
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
SIDR: structure-aware intelligent data routing in Hadoop
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
A distributed algorithm for large-scale generalized matching
Proceedings of the VLDB Endowment
Maximal clique enumeration for large graphs on hadoop framework
Proceedings of the first workshop on Parallel programming for analytics applications
Hi-index | 0.00 |
The MapReduce framework is currently the de facto standard used throughout both industry and academia for petabyte scale data analysis. As the input to a typical MapReduce computation is large, one of the key requirements of the framework is that the input cannot be stored on a single machine and must be processed in parallel. In this paper we describe a general algorithmic design technique in the MapReduce framework called filtering. The main idea behind filtering is to reduce the size of the input in a distributed fashion so that the resulting, much smaller, problem instance can be solved on a single machine. Using this approach we give new algorithms in the MapReduce framework for a variety of fundamental graph problems for sufficiently dense graphs. Specifically, we present algorithms for minimum spanning trees, maximal matchings, approximate weighted matchings, approximate vertex and edge covers and minimum cuts. In all of these cases, we parameterize our algorithms by the amount of memory available on the machines allowing us to show tradeoffs between the memory available and the number of MapReduce rounds. For each setting we will show that even if the machines are only given substantially sublinear memory, our algorithms run in a constant number of MapReduce rounds. To demonstrate the practical viability of our algorithms we implement the maximal matching algorithm that lies at the core of our analysis and show that it achieves a significant speedup over the sequential version.