Efficient NC algorithms for set cover with applications to learning and geometry
Proceedings of the 30th IEEE symposium on Foundations of computer science
A threshold of ln n for approximating set cover
Journal of the ACM (JACM)
Using association rules for product assortment decisions: a case study
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Set Cover with Requirements and Costs Evolving over Time
RANDOM-APPROX '99 Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization Problems: Randomization, Approximation, and Combinatorial Algorithms and Techniques
Experimental analysis of approximation algorithms for the vertex cover and set covering problems
Computers and Operations Research
On generating near-optimal tableaux for conditional functional dependencies
Proceedings of the VLDB Endowment
Approximation algorithms for combinatorial problems
Journal of Computer and System Sciences
Proceedings of the 19th international conference on World wide web
SCARAB: scaling reachability computation on large graphs
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Parallel and I/O efficient set covering algorithms
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Fast greedy algorithms in mapreduce and streaming
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Hi-index | 0.00 |
The problem of Set Cover - to find the smallest subcollection of sets that covers some universe - is at the heart of many data and analysis tasks. It arises in a wide range of settings, including operations research, machine learning, planning, data quality and data mining. Although finding an optimal solution is NP-hard, the greedy algorithm is widely used, and typically finds solutions that are close to optimal. However, a direct implementation of the greedy approach, which picks the set with the largest number of uncovered items at each step, does not behave well when the input is very large and disk resident. The greedy algorithm must make many random accesses to disk, which are unpredictable and costly in comparison to linear scans. In order to scale Set Cover to large datasets, we provide a new algorithm which finds a solution that is provably close to that of greedy, but which is much more efficient to implement using modern disk technology. Our experiments show a ten-fold improvement in speed on moderately-sized datasets, and an even greater improvement on larger datasets.