Filtering: a method for solving graph problems in MapReduce

Authors:
Silvio Lattanzi;Benjamin Moseley;Siddharth Suri;Sergei Vassilvitskii
Affiliations:
Google Inc, New York, NY, USA;University of Illinois, Urbana, IL, USA;Yahoo! Inc, New York, NY, USA;Yahoo! Inc., New York, NY, USA
Venue:
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Year:
2011

Citing 14
Cited 14

A fast and simple randomized parallel algorithm for maximal matching

Information Processing Letters
An Õ(n2) algorithm for minimum cuts

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
A randomized linear-time algorithm for finding minimum spanning trees

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Global min-cuts in RNC, and other ramifications of a simple min-out algorithm

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
A minimum spanning tree algorithm with inverse-Ackermann type complexity

Journal of the ACM (JACM)
Graphs over time: densification laws, shrinking diameters and possible explanations

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On graph problems in a semi-streaming model

Theoretical Computer Science - Automata, languages and programming: Algorithms and complexity (ICALP-A 2004)
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Graph sparsification by effective resistances

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce
A model of computation for MapReduce

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Everyone's an influencer: quantifying influence on twitter

Proceedings of the fourth ACM international conference on Web search and data mining

Analyzing graph structure via linear measurements

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Densest subgraph in streaming and MapReduce

Proceedings of the VLDB Endowment
Scalable k-means++

Proceedings of the VLDB Endowment
Distributed GraphLab: a framework for machine learning and data mining in the cloud

Proceedings of the VLDB Endowment
Space-round tradeoffs for MapReduce computations

Proceedings of the 26th ACM international conference on Supercomputing
CC-MR --- finding connected components in huge graphs with mapreduce

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process

International Journal of Intelligent Systems
Minimal MapReduce algorithms

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Fast greedy algorithms in mapreduce and streaming

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Towards systematic parallel programming of graph problems via tree decomposition and tree parallelism

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
SIDR: structure-aware intelligent data routing in Hadoop

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
A distributed algorithm for large-scale generalized matching

Proceedings of the VLDB Endowment
Maximal clique enumeration for large graphs on hadoop framework

Proceedings of the first workshop on Parallel programming for analytics applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The MapReduce framework is currently the de facto standard used throughout both industry and academia for petabyte scale data analysis. As the input to a typical MapReduce computation is large, one of the key requirements of the framework is that the input cannot be stored on a single machine and must be processed in parallel. In this paper we describe a general algorithmic design technique in the MapReduce framework called filtering. The main idea behind filtering is to reduce the size of the input in a distributed fashion so that the resulting, much smaller, problem instance can be solved on a single machine. Using this approach we give new algorithms in the MapReduce framework for a variety of fundamental graph problems for sufficiently dense graphs. Specifically, we present algorithms for minimum spanning trees, maximal matchings, approximate weighted matchings, approximate vertex and edge covers and minimum cuts. In all of these cases, we parameterize our algorithms by the amount of memory available on the machines allowing us to show tradeoffs between the memory available and the number of MapReduce rounds. For each setting we will show that even if the machines are only given substantially sublinear memory, our algorithms run in a constant number of MapReduce rounds. To demonstrate the practical viability of our algorithms we implement the maximal matching algorithm that lies at the core of our analysis and show that it achieves a significant speedup over the sequential version.