On the implementation of parallel shortest path algorithms on a supercomputer

Authors:
Gabriele Di Stefano;Alberto Petricola;Christos Zaroliagis
Affiliations:
Dipartimento di Ingegneria Elettrica e dell'Informazione, Università dell'Aquila, Italy;Dipartimento di Ingegneria Elettrica e dell'Informazione, Università dell'Aquila, Italy;Computer Technology Institute and University of Patras, Greece
Venue:
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Year:
2006

Citing 10
Cited 1

Fibonacci heaps and their uses in improved network optimization algorithms

Journal of the ACM (JACM)
High-performance computer architecture (2nd ed.)

High-performance computer architecture (2nd ed.)
Parallel algorithms for shared-memory machines

Handbook of theoretical computer science (vol. A)
An introduction to parallel algorithms

An introduction to parallel algorithms
A parallel priority queue with constant time operations

Journal of Parallel and Distributed Computing - Parallel and distributed data structures
Optimal and Load Balanced Mapping of Parallel Priority Queues in Hypercubes

IEEE Transactions on Parallel and Distributed Systems
The APEmille Project

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Priority Queues on Parallel Machines

SWAT '96 Proceedings of the 5th Scandinavian Workshop on Algorithm Theory
Fast and Efficient Operations on Parallel Priority Queues

ISAAC '94 Proceedings of the 5th International Symposium on Algorithms and Computation
Parallel simulation of orography influence on large-scale atmosphere motion on APEmille

Proceedings of the 1st conference on Computing frontiers

CUDA Solutions for the SSSP Problem

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the practical merits of a parallel priority queue through its use in the development of a fast and work-efficient parallel shortest path algorithm, originally designed for an EREW PRAM. Our study reveals that an efficient implementation on a real supercomputer requires considerable effort to reduce the communication performance (which in theory is assumed to take constant time). It turns out that the most crucial part of the implementation is the mapping of the logical processors to the physical processing nodes of the supercomputer. We achieve the requested efficient mapping through a new graph-theoretic result of independent interest: computing a Hamiltonian cycle on a directed hyper-torus. No such algorithm was known before for the case of directed hypertori. Our Hamiltonian cycle algorithm allows us to considerably improve the communication cost and thus the overall performance of our implementation.