Designing Practical Efficient Algorithms for Symmetric Multiprocessors

Authors:
David R. Helman;Joseph JáJá
Affiliations:
-;-
Venue:
ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Year:
1999

Citing 13
Cited 7

A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems

Communications of the ACM
Parallel sorting by regular sampling

Journal of Parallel and Distributed Computing
List ranking and list scan on the Cray C90

Journal of Computer and System Sciences
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A new deterministic parallel sorting algorithm with an experimental evaluation

Journal of Experimental Algorithmics (JEA)
The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms

SIAM Journal on Computing
The influence of caches on the performance of sorting

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Deterministic Parallel List Ranking

AWOC '88 Proceedings of the 3rd Aegean Workshop on Computing: VLSI Algorithms and Architectures
Randomized speed-ups in parallel computation

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Sorting on Clusters of SMPs

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
The complexity of parallel computations

The complexity of parallel computations
Hierarchical memory with block transfer

SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science

Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs) (Extended Abstract)

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture

WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
An Experimental Study of Parallel Biconnected Components Algorithms on Symmetric Multiprocessors (SMPs)

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)

Journal of Parallel and Distributed Computing
Fast PGAS Implementation of Distributed Graph Algorithms

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Techniques for designing efficient parallel graph algorithms for SMPs and multicore processors

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

Symmetric multiprocessors (SMPs) dominate the high-end server market and are currently the primary candidate for constructing large scale multiprocessor systems. Yet, the design of efficient parallel algorithms for this platform currently poses several challenges. In this paper, we present a computational model for designing efficient algorithms for symmetric multiprocessors. We then use this model to create efficient solutions to two widely different types of problems - linked list prefix computations and generalized sorting. Our novel algorithm for prefix computations builds upon the sparse ruling set approach of Reid-Miller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Our algorithm for generalized sorting is a modification of our algorithm for sorting by regular sampling on distributed memory architectures. The algorithm is a stable sort which appears to be asymptotically faster than any of the published algorithms for SMPs. Both of our algorithms were implemented in C using POSIX threads and run on four symmetric multiprocessors - the IBM SP-2 (High Node), the HP-Convex Exemplar (S-Class), the DEC AlphaServer, and the Silicon Graphics Power Challenge. We ran our code for each algorithm using a variety of benchmarks which we identified to examine the dependence of our algorithm on memory access patterns. In spite of the fact that the processors must compete for access to main memory, both algorithms still yielded scalable performance up to 16 processors, which was the largest platform available to us. For some problems, our prefix computation algorithm actually matched or exceeded the performance of the standard sequential solution using only a single thread. Similarly, our generalized sorting algorithm always beat the performance of sequential merge sort by at least an order of magnitude, even with a single thread.