On doing Todd-Coxeter coset enumerated in parallel
Discrete Applied Mathematics - Special volume: combinatorics and theoretical computer science
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
A case study of multi-threaded Gröbner basis completion
ISSAC '96 Proceedings of the 1996 international symposium on Symbolic and algebraic computation
Scalable parallel coset enumeration using bulk definition
Proceedings of the 2001 international symposium on Symbolic and algebraic computation
TOP-C: Task-Oriented Parallel C for Distributed and Shared Memory
Workshop on Wide Area Networks and High Performance Computing
TOP-C: a task-oriented parallel C interface
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Memory-based and disk-based algorithms for very high degree permutation groups
ISSAC '03 Proceedings of the 2003 international symposium on Symbolic and algebraic computation
The TOP-C parallel model and symbolic algebra
ACM SIGSAM Bulletin
Hi-index | 0.00 |
Coset enumeration, like Gröbner bases, is a notoriously difficult algorithm to parallelize. We demonstrate a successful shared memory parallelization achieving a seven times speedup on an Origin 2000 CC-NUMA computer using 16 CPUs. We take as a testbed, an enumeration of Lyons's group (8 835 156 cosets). This provides comparability with previous efforts in the literature (Cooperman and Havas, 1997; Havas and Ramsay, 2000) for which the best previous speedup was a factor of 4. The new parallelization depends on two new heuristics, clouds and shallow scan. Clouds is an example of bulk definition of cosets, which forms the key to our more efficient parallelization. The parallelization is implemented using TOP-C. By taking advantage of TOP-C's option to compile for either shared or distributed memory, we also demonstrate the first efficient parallelization of a coset enumeration program using distributed memory.Our faster results expose for the first time in the context of coset enumeration the "memory wall", i.e. the latency barrier of the RAM. We verify this memory wall by showing on an Origin 2000 that a memory-bound parallel program with 64 CPUs doing nothing but randomly accessing RAM achieves at best a speedup of only 3.75 over the single-CPU version. Further, it is demonstrated that even sequential versions of coset enumerations programs are memory latency-bound in today's technology. The lessons of bulk definition and of the memory wall carry over to related algorithms such as Gröbner bases, Knuth-Bendix, and other symbolic algebra algorithms with intermediate swell.