An efficient programming model for memory-intensive recursive algorithms using parallel disks

Authors:
Vlad Slavici;Daniel Kunkle;Gene Cooperman;Stephen Linton
Affiliations:
Northeastern University, Boston, MA;Google Inc., New York;Northeastern University, Boston, MA;University of St. Andrews, St. Andrews, Scotland
Venue:
Proceedings of the 37th International Symposium on Symbolic and Algebraic Computation
Year:
2012

Citing 17
Cited 0

An efficient parallel algorithm for the single function coarsest partition problem

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Cilk: an efficient multithreaded runtime system

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Permutations generated by token passing in graphs

Theoretical Computer Science
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
Generalized Stack Permutations

Combinatorics, Probability and Computing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Twenty-six moves suffice for Rubik's cube

Proceedings of the 2007 international symposium on Symbolic and algebraic computation
A comparative analysis of parallel disk-based Methods for enumerating implicit graphs

Proceedings of the 2007 international workshop on Parallel symbolic computation
Large implicit state space enumeration: overcoming memory and disk limitations

Large implicit state space enumeration: overcoming memory and disk limitations
Simulated Annealing with Iterative Improvement

ICSPS '09 Proceedings of the 2009 International Conference on Signal Processing Systems
Delayed duplicate detection: extended abstract

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers)

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Parallel disk-based computation for large, monolithic binary decision diagrams

Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Map-reduce extensions and recursive queries

Proceedings of the 14th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to keep up with the demand for solutions to problems with ever-increasing data sets, both academia and industry have embraced commodity computer clusters with locally attached disks or SANs as an inexpensive alternative to supercomputers. With the advent of tools for parallel disks programming, such as MapReduce, STXXL and Roomy --- that allow the developer to focus on higher-level algorithms --- the programmer productivity for memory-intensive programs has increased many-fold. However, such parallel tools were primarily targeted at iterative programs. We propose a programming model for migrating recursive RAM-based legacy algorithms to parallel disks. Many memory-intensive symbolic algebra algorithms are most easily expressed as recursive algorithms. In this case, the programming challenge is multiplied, since the developer must re-structure such an algorithm with two criteria in mind: converting a naturally recursive algorithm into an iterative algorithm, while simultaneously exposing any potential data parallelism (as needed for parallel disks). This model alleviates the large effort going into the design phase of an external memory algorithm. Research in this area over the past 10 years has focused on per-problem solutions, without providing much insight into the connection between legacy algorithms and out-of-core algorithms. Our method shows how legacy algorithms employing recursion and non-streaming memory access can be more easily translated into efficient parallel disk-based algorithms. We demonstrate the ideas on a largest computation of its kind: the determinization via subset construction and minimization of very large nondeterministic finite set automata (NFA). To our knowledge, this is the largest subset construction reported in the literature. Determinization for large NFA has long been a large computational hurdle in the study of permutation classes defined by token passing networks. The programming model was used to design and implement an efficient NFA determinization algorithm that solves the next stage in analyzing token passing networks representing two stacks in series.