Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture

Authors:
David A. Bader;Ajith K. Illendula;Bernard M. E. Moret;Nina R. Weisse-Bernstein
Affiliations:
-;-;-;-
Venue:
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Year:
2001

Citing 33
Cited 5

A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Parallel ear decomposition search (EDS) and st-numbering in graphs

Theoretical Computer Science
The input/output complexity of sorting and related problems

Communications of the ACM
Improved algorithms for graph four-connectivity

Journal of Computer and System Sciences
An introduction to parallel algorithms

An introduction to parallel algorithms
Parallel recognition of series-parallel graphs

Information and Computation
Efficient parallel graph algorithms based on open ear decomposition

Parallel Computing
List ranking and list scan on the Cray C-90

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Combinatorial Algorithm for a Lower Bound on Frame Rigidity

SIAM Journal on Discrete Mathematics
The influence of caches on the performance of heaps

Journal of Experimental Algorithmics (JEA)
Efficient massively parallel implementation of some combinatorial algorithms

Theoretical Computer Science
Hammock-on-ears decomposition: a technique for the efficient parallel solution of shortest paths and other problems

MFCS '94 Selected papers from the 19th international symposium on Mathematical foundations of computer science
List ranking and list scan on the Cray C90

Journal of Computer and System Sciences
Can shared-memory model serve as a bridging model for parallel computation?

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Better trade-offs for parallel list ranking

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms

SIAM Journal on Computing
The influence of caches on the performance of sorting

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Cache performance analysis of traversals and random accesses

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Ear decomposition with bounds on ear length

Information Processing Letters
SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)

Journal of Parallel and Distributed Computing
An Optimal Distributed Ear Decomposition Algorithm with Applications to Biconnectivity and Outerplanarity Testing

IEEE Transactions on Parallel and Distributed Systems
Generic rigidity of molecular graphs via ear decomposition

Discrete Applied Mathematics
Practical Pram Programming

Practical Pram Programming
Synthesis of Parallel Algorithms

Synthesis of Parallel Algorithms
Graph Ear Decompositions and Graph Embeddings

SIAM Journal on Discrete Mathematics
Starfire: Extending the SMP Envelope

IEEE Micro
Experimental Evaluation of QSM, a Simple Shared-Memory Model

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Implementation of parallel graph algorithms on a massively parallel SIMD computer with virtual processing

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator

FCRC '96/WACG '96 Selected papers from the Workshop on Applied Computational Geormetry, Towards Geometric Engineering
Designing Practical Efficient Algorithms for Symmetric Multiprocessors

ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Predicting Performance on SMPs. A Case Study: The SGI Power Challenge

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Computing ears and branchings in parallel

SFCS '85 Proceedings of the 26th Annual Symposium on Foundations of Computer Science

Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs) (Extended Abstract)

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols

Journal of Parallel and Distributed Computing
A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)

Journal of Parallel and Distributed Computing
A parallel state assignment algorithm for finite state machines

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
High-Performance algorithm engineering for large-scale graph problems and computational biology

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the results of an extensive experimental study demonstrating that the resulting programs scale nearly linearly across a significant range of processors (from 1 to 64) and across the entire range of instance sizes tested. This linear speedup with the number of processors is, to our knowledge, the first ever attained in practice for intricate combinatorial problems. The example we present in detail here is a graph decomposition algorithm that also requires the computation of a spanning tree; this problem is not only of interest in its own right, but is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but have no known efficient parallel implementations. Our results thus offer promise for bridging the gap between the theory and practice of shared-memory parallel algorithms.