Storing a Sparse Table with 0(1) Worst Case Access Time
Journal of the ACM (JACM)
Sorting in c log n parallel steps
Combinatorica
SIAM Journal on Computing
Optimal and sublogarithmic time randomized parallel sorting algorithms
SIAM Journal on Computing
Faster optimal parallel prefix sums and list ranking
Information and Computation
Hybridsort revisited and parallelized
Information Processing Letters
Information Processing Letters
Scans as Primitive Parallel Operations
IEEE Transactions on Computers
A complexity theory of efficient parallel algorithms
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation
Communications of the ACM
A new universal class of hash functions and dynamic hashing in real time
Proceedings of the seventeenth international colloquium on Automata, languages and programming
Converting high probability into nearly-constant time—with applications to parallel hashing
STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Fast parallel generation of random permutations
Proceedings of the 18th international colloquium on Automata, languages and programming
Parallel algorithms for shared-memory machines
Handbook of theoretical computer science (vol. A)
Towards a theory of nearly constant time parallel algorithms
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Fast hashing on a PRAM—designing by expectation
SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Ultra-fast expected time parallel algorithms
SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
An introduction to parallel algorithms
An introduction to parallel algorithms
Improved parallel integer sorting without concurrent writing
SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
Implementation of a portable nested data-parallel language
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The QRQW PRAM: accounting for contention in parallel algorithms
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
Synthesis of Parallel Algorithms
Synthesis of Parallel Algorithms
ICALP '94 Proceedings of the 21st International Colloquium on Automata, Languages and Programming
Graph Theory With Applications
Graph Theory With Applications
An optical simulation of shared memory
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Accounting for memory bank contention and delay in high-bandwidth multiprocessors
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Asynchronous shared memory search structures
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Asynchrony versus bulk-synchrony in QRQW PRAM models
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Portable and Efficient Parallel Computing Using the BSP Model
IEEE Transactions on Computers
Lower Bounds for Randomized Exclusive Write PRAMs
Theory of Computing Systems
Hi-index | 0.00 |
The queue-read, queue-write (QRQW) PRAM model [GMR94] permits concurrent reading and writing, but at a cost proportional to the number of readers/writers to a memory location in a given step. The QRQW model reflects the contention properties of most parallel machines more accurately than either the well-studied CRCW or EREW models: the CRCW model does not adequately penalize algorithms with high contention to shared memory locations, while the EREW model is too strict in its insistence on zero contention at each step. Of primary practical and theoretical interest, then, is the design of fast and efficient QRQW algorithms for problems for which all previous algorithms either suffer from high contention, fail to be fast, or fail to be work-optimal.This paper describes low-contention, fast, work-optimal QRQW PRAM algorithms for the fundamental problems of finding a random permutation, parallel hashing, load balancing, and sorting. There is no known fast, work-optimal EREW algorithm known for finding a random permutation or for parallel hashing. For load balancing, we improve upon the EREW result whenever the ratio of the maximum to the average load is not too large. We show that the logarithmic dependence of the QRQW running time on this ratio is inherent by providing a matching lower bound.We demonstrate the performance advantage of a QRQW random permutation algorithm, compared with the popular EREW algorithm, by implementing and running both algorithms on the MasPar MP-1.Finally, we extend the work-time framework for the design of parallel algorithms to account for contention, and relate it to the QRQW PRAM model. We use our QRQW load balancing algorithm, as well as the QRQW linear compaction algorithm in [GMR94], to provide automatic tools for processor allocation—an issue that needs to be handled when translating an algorithm from its work-time presentation into the explicit PRAM description.