Efficient NC algorithms for set cover with applications to learning and geometry
Proceedings of the 30th IEEE symposium on Foundations of computer science
Approximation algorithms for NP-hard problems
Approximation algorithms for NP-hard problems
A threshold of ln n for approximating set cover
Journal of the ACM (JACM)
The budgeted maximum coverage problem
Information Processing Letters
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Maximizing the spread of influence through a social network
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximation algorithms for partial covering problems
Journal of Algorithms
Algorithmic construction of sets for k-restrictions
ACM Transactions on Algorithms (TALG)
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
The discoverability of the web
Proceedings of the 16th international conference on World Wide Web
Cost-effective outbreak detection in networks
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Connectivity structure of bipartite graphs via the KNC-plot
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
On distributing symmetric streaming computations
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
MapReduce for Data Intensive Scientific Analyses
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Efficient influence maximization in social networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
DOULION: counting triangles in massive graphs with a coin
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Pairwise document similarity in large collections with MapReduce
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
On single-pass indexing with MapReduce
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Graph Twiddling in a MapReduce World
Computing in Science and Engineering
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploring large-data issues in the curriculum: a case study with MapReduce
TeachCL '08 Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics
Ranking and semi-supervised classification on large scale graphs using map-reduce
TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
A model of computation for MapReduce
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Set cover algorithms for very large datasets
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
An algorithmic treatment of strong queries
Proceedings of the fourth ACM international conference on Web search and data mining
Social content matching in MapReduce
Proceedings of the VLDB Endowment
Linear-work greedy parallel approximate set cover and variants
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
On scheduling in map-reduce and flow-shops
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Fast clustering using MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Densest subgraph in streaming and MapReduce
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Space-round tradeoffs for MapReduce computations
Proceedings of the 26th ACM international conference on Supercomputing
Parallel and I/O efficient set covering algorithms
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce
Proceedings of the 21st ACM international conference on Information and knowledge management
Computing n-gram statistics in MapReduce
Proceedings of the 16th International Conference on Extending Database Technology
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Fast greedy algorithms in mapreduce and streaming
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Hi-index | 0.00 |
The NP-hard Max-k-cover problem requires selecting k sets from a collection so as to maximize the size of the union. This classic problem occurs commonly in many settings in web search and advertising. For moderately-sized instances, a greedy algorithm gives an approximation of (1-1/e). However, the greedy algorithm requires updating scores of arbitrary elements after each step, and hence becomes intractable for large datasets. We give the first max cover algorithm designed for today's large-scale commodity clusters. Our algorithm has provably almost the same approximation as greedy, but runs much faster. Furthermore, it can be easily expressed in the MapReduce programming paradigm, and requires only polylogarithmically many passes over the data. Our experiments on five large problem instances show that our algorithm is practical and can achieve good speedups compared to the sequential greedy algorithm.