Scalable parallel coset enumeration: bulk definition and the memory wall

Authors:
Gene Cooperman;Victor Grinberg
Affiliations:
College of Computer Science, Northeastern University, Boston, MA;College of Computer Science, Northeastern University, Boston, MA
Venue:
Journal of Symbolic Computation - Computer algebra: Selected papers from ISSAC 2001
Year:
2002

Citing 6
Cited 3

On doing Todd-Coxeter coset enumerated in parallel

Discrete Applied Mathematics - Special volume: combinatorics and theoretical computer science
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
A case study of multi-threaded Gröbner basis completion

ISSAC '96 Proceedings of the 1996 international symposium on Symbolic and algebraic computation
Scalable parallel coset enumeration using bulk definition

Proceedings of the 2001 international symposium on Symbolic and algebraic computation
TOP-C: Task-Oriented Parallel C for Distributed and Shared Memory

Workshop on Wide Area Networks and High Performance Computing
TOP-C: a task-oriented parallel C interface

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing

Overcoming the memory wall in symbolic algebra: a faster permutation multiplication

ACM SIGSAM Bulletin
Memory-based and disk-based algorithms for very high degree permutation groups

ISSAC '03 Proceedings of the 2003 international symposium on Symbolic and algebraic computation
The TOP-C parallel model and symbolic algebra

ACM SIGSAM Bulletin

Quantified Score

Hi-index	0.00

Visualization

Abstract

Coset enumeration, like Gröbner bases, is a notoriously difficult algorithm to parallelize. We demonstrate a successful shared memory parallelization achieving a seven times speedup on an Origin 2000 CC-NUMA computer using 16 CPUs. We take as a testbed, an enumeration of Lyons's group (8 835 156 cosets). This provides comparability with previous efforts in the literature (Cooperman and Havas, 1997; Havas and Ramsay, 2000) for which the best previous speedup was a factor of 4. The new parallelization depends on two new heuristics, clouds and shallow scan. Clouds is an example of bulk definition of cosets, which forms the key to our more efficient parallelization. The parallelization is implemented using TOP-C. By taking advantage of TOP-C's option to compile for either shared or distributed memory, we also demonstrate the first efficient parallelization of a coset enumeration program using distributed memory.Our faster results expose for the first time in the context of coset enumeration the "memory wall", i.e. the latency barrier of the RAM. We verify this memory wall by showing on an Origin 2000 that a memory-bound parallel program with 64 CPUs doing nothing but randomly accessing RAM achieves at best a speedup of only 3.75 over the single-CPU version. Further, it is demonstrated that even sequential versions of coset enumerations programs are memory latency-bound in today's technology. The lessons of bulk definition and of the memory wall carry over to related algorithms such as Gröbner bases, Knuth-Bendix, and other symbolic algebra algorithms with intermediate swell.