Optimal parallel algorithms for dynamic expression evaluation and context-free recognition
Information and Computation
An introduction to parallel algorithms
An introduction to parallel algorithms
Efficient massively parallel implementation of some combinatorial algorithms
Theoretical Computer Science
List ranking and list scan on the Cray C90
Journal of Computer and System Sciences
Better trade-offs for parallel list ranking
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Journal of Parallel and Distributed Computing
Practical Pram Programming
Synthesis of Parallel Algorithms
Synthesis of Parallel Algorithms
The sun fireplane system interconnect
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Starfire: Extending the SMP Envelope
IEEE Micro
Experimental Evaluation of QSM, a Simple Shared-Memory Model
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Efficient Parallel Graph Algorithms For Coarse Grained Multicomputers and BSP
ICALP '97 Proceedings of the 24th International Colloquium on Automata, Languages and Programming
Designing Practical Efficient Algorithms for Symmetric Multiprocessors
ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Parallel tree contraction and its application
SFCS '85 Proceedings of the 26th Annual Symposium on Foundations of Computer Science
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Research note: Parallel algorithms for tree accumulations
Journal of Parallel and Distributed Computing
Towards automatic parallelization of tree reductions in dynamic programming
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols
Journal of Parallel and Distributed Computing
A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)
Journal of Parallel and Distributed Computing
Parallel skeletons for manipulating general trees
Parallel Computing - Algorithmic skeletons
High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs
IEEE Transactions on Parallel and Distributed Systems
Area-efficient arithmetic expression evaluation using deeply pipelined floating-point cores
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fast and scalable list ranking on the GPU
Proceedings of the 23rd international conference on Supercomputing
Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
A parallel state assignment algorithm for finite state machines
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
High-Performance algorithm engineering for large-scale graph problems and computational biology
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Hi-index | 0.01 |
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the results of an extensive experimental study demonstrating that the resulting programs scale nearly linearly across a significant range of processors and across the entire range of instance sizes tested. This linear speedup with the number of processors is one of the first ever attained in practice for intricate combinatorial problems. The example we present in detail here is for evaluating arithmetic expression trees using the algorithmic techniques of list ranking and tree contraction; this problem is not only of interest in its own right, but is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but have no known efficient parallel implementations. Our results thus offer promise for bridging the gap between the theory and practice of shared-memory parallel algorithms.