Fast greedy algorithms in mapreduce and streaming

Authors:
Ravi Kumar;Benjamin Moseley;Sergei Vassilvitskii;Andrea Vattani
Affiliations:
Google, Mountain View, CA, USA;Toyota Technological Institute at Chicago, Chicago, IL, USA;Google, Mountain View, CA, USA;University of California, San Diego, CA, USA
Venue:
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Year:
2013

Citing 30
Cited 0

A bridging model for parallel computation

Communications of the ACM
Efficient NC algorithms for set cover with applications to learning and geometry

Proceedings of the 30th IEEE symposium on Foundations of computer science
A threshold of ln n for approximating set cover

Journal of the ACM (JACM)
Greedy approximation algorithms for finding dense components in a graph

APPROX '00 Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization
Maximizing the spread of influence through a social network

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptivity and approximation for stochastic packing problems

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Asking the right questions: model-driven optimization using probes

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the complexity of approximating k-set packing

Computational Complexity
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Cost-effective outbreak detection in networks

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Matroids, secretary problems, and online mechanisms

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Efficient influence maximization in social networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning string transformations from examples

Proceedings of the VLDB Endowment
Max-cover in map-reduce

Proceedings of the 19th international conference on World wide web
Scalable influence maximization for prevalent viral marketing in large-scale social networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Set cover algorithms for very large datasets

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A model of computation for MapReduce

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Submodular secretary problem and extensions

APPROX/RANDOM'10 Proceedings of the 13th international conference on Approximation, and 14 the International conference on Randomization, and combinatorial optimization: algorithms and techniques
Constrained non-monotone submodular maximization: offline and secretary algorithms

WINE'10 Proceedings of the 6th international conference on Internet and network economics
Efficient diversification of web search results

Proceedings of the VLDB Endowment
Linear-work greedy parallel approximate set cover and variants

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Filtering: a method for solving graph problems in MapReduce

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Representative skylines using threshold-based preference distributions

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Fast clustering using MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A data-based approach to social influence maximization

Proceedings of the VLDB Endowment
Densest subgraph in streaming and MapReduce

Proceedings of the VLDB Endowment
Sorting, searching, and simulation in the mapreduce framework

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Maximizing a Monotone Submodular Function Subject to a Matroid Constraint

SIAM Journal on Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Greedy algorithms are practitioners' best friends - they are intuitive, simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently sequential, and it is not clear how to take advantage of the extra processing power. Our main result is a powerful sampling technique that aids in parallelization of sequential algorithms. We then show how to use this primitive to adapt a broad class of greedy algorithms to the MapReduce paradigm; this class includes maximum cover and submodular maximization subject to p-system constraints. Our method yields efficient algorithms that run in a logarithmic number of rounds, while obtaining solutions that are arbitrarily close to those produced by the standard sequential greedy algorithm. We begin with algorithms for modular maximization subject to a matroid constraint, and then extend this approach to obtain approximation algorithms for submodular maximization subject to knapsack or p-system constraints. Finally, we empirically validate our algorithms, and show that they achieve the same quality of the solution as standard greedy algorithms but run in a substantially fewer number of rounds.