Fibonacci heaps and their uses in improved network optimization algorithms
Journal of the ACM (JACM)
High-performance computer architecture (2nd ed.)
High-performance computer architecture (2nd ed.)
Parallel algorithms for shared-memory machines
Handbook of theoretical computer science (vol. A)
An introduction to parallel algorithms
An introduction to parallel algorithms
A parallel priority queue with constant time operations
Journal of Parallel and Distributed Computing - Parallel and distributed data structures
Optimal and Load Balanced Mapping of Parallel Priority Queues in Hypercubes
IEEE Transactions on Parallel and Distributed Systems
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Priority Queues on Parallel Machines
SWAT '96 Proceedings of the 5th Scandinavian Workshop on Algorithm Theory
Fast and Efficient Operations on Parallel Priority Queues
ISAAC '94 Proceedings of the 5th International Symposium on Algorithms and Computation
Parallel simulation of orography influence on large-scale atmosphere motion on APEmille
Proceedings of the 1st conference on Computing frontiers
CUDA Solutions for the SSSP Problem
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Hi-index | 0.00 |
We investigate the practical merits of a parallel priority queue through its use in the development of a fast and work-efficient parallel shortest path algorithm, originally designed for an EREW PRAM. Our study reveals that an efficient implementation on a real supercomputer requires considerable effort to reduce the communication performance (which in theory is assumed to take constant time). It turns out that the most crucial part of the implementation is the mapping of the logical processors to the physical processing nodes of the supercomputer. We achieve the requested efficient mapping through a new graph-theoretic result of independent interest: computing a Hamiltonian cycle on a directed hyper-torus. No such algorithm was known before for the case of directed hypertori. Our Hamiltonian cycle algorithm allows us to considerably improve the communication cost and thus the overall performance of our implementation.