A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Parallel ear decomposition search (EDS) and st-numbering in graphs
Theoretical Computer Science
The input/output complexity of sorting and related problems
Communications of the ACM
Improved algorithms for graph four-connectivity
Journal of Computer and System Sciences
An introduction to parallel algorithms
An introduction to parallel algorithms
Parallel recognition of series-parallel graphs
Information and Computation
Efficient parallel graph algorithms based on open ear decomposition
Parallel Computing
List ranking and list scan on the Cray C-90
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Combinatorial Algorithm for a Lower Bound on Frame Rigidity
SIAM Journal on Discrete Mathematics
The influence of caches on the performance of heaps
Journal of Experimental Algorithmics (JEA)
Efficient massively parallel implementation of some combinatorial algorithms
Theoretical Computer Science
MFCS '94 Selected papers from the 19th international symposium on Mathematical foundations of computer science
List ranking and list scan on the Cray C90
Journal of Computer and System Sciences
Can shared-memory model serve as a bridging model for parallel computation?
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Better trade-offs for parallel list ranking
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms
SIAM Journal on Computing
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Cache performance analysis of traversals and random accesses
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Ear decomposition with bounds on ear length
Information Processing Letters
Journal of Parallel and Distributed Computing
IEEE Transactions on Parallel and Distributed Systems
Generic rigidity of molecular graphs via ear decomposition
Discrete Applied Mathematics
Practical Pram Programming
Synthesis of Parallel Algorithms
Synthesis of Parallel Algorithms
Graph Ear Decompositions and Graph Embeddings
SIAM Journal on Discrete Mathematics
Starfire: Extending the SMP Envelope
IEEE Micro
Experimental Evaluation of QSM, a Simple Shared-Memory Model
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator
FCRC '96/WACG '96 Selected papers from the Workshop on Applied Computational Geormetry, Towards Geometric Engineering
Designing Practical Efficient Algorithms for Symmetric Multiprocessors
ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Predicting Performance on SMPs. A Case Study: The SGI Power Challenge
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Computing ears and branchings in parallel
SFCS '85 Proceedings of the 26th Annual Symposium on Foundations of Computer Science
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols
Journal of Parallel and Distributed Computing
A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)
Journal of Parallel and Distributed Computing
A parallel state assignment algorithm for finite state machines
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
High-Performance algorithm engineering for large-scale graph problems and computational biology
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Hi-index | 0.00 |
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the results of an extensive experimental study demonstrating that the resulting programs scale nearly linearly across a significant range of processors (from 1 to 64) and across the entire range of instance sizes tested. This linear speedup with the number of processors is, to our knowledge, the first ever attained in practice for intricate combinatorial problems. The example we present in detail here is a graph decomposition algorithm that also requires the computation of a spanning tree; this problem is not only of interest in its own right, but is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but have no known efficient parallel implementations. Our results thus offer promise for bridging the gap between the theory and practice of shared-memory parallel algorithms.