A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)

Authors:
David A. Bader;Guojing Cong
Affiliations:
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA;IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2005

Citing 21
Cited 14

New Connectivity and MSF Algorithms for Shuffle-Exchange Network and PRAM

IEEE Transactions on Computers
Parallel graph contraction

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Approximate parallel scheduling. II.: applications to logarithmic-time optimal parallel graph algorithms

Information and Computation
An optimal randomized parallel algorithm for finding connected components in a graph

SIAM Journal on Computing
Connected components in O(lg3/2|V|) parallel time for the CREW PRAM (extended abstract)

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
A parallel algorithm for computing minimum spanning trees

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
A comparison of parallel algorithms for connected components

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
A randomized linear-time algorithm to find minimum spanning trees

Journal of the ACM (JACM)
Finding connected components in O(log n log log n) time on the EREW PRAM

SODA '93 Selected papers from the fourth annual ACM SIAM symposium on Discrete algorithms
An efficient and fast parallel-connected component algorithm

Journal of the ACM (JACM)
LEDA: a platform for combinatorial and geometric computing

LEDA: a platform for combinatorial and geometric computing
Efficient parallel algorithms for some graph problems

Communications of the ACM
Computing connected components on parallel computers

Communications of the ACM
Prefix computations on symmetric multiprocessors

Journal of Parallel and Distributed Computing
Starfire: Extending the SMP Envelope

IEEE Micro
Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs) (Extended Abstract)

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Parallel Implementation of Borvka's Minimum Spanning Tree Algorithm

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Designing Practical Efficient Algorithms for Symmetric Multiprocessors

ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture

WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Practical Parallel Algorithms for Minimum Spanning Trees

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Modeling Internet topology

IEEE Communications Magazine

Designing irregular parallel algorithms with mutual exclusion and lock-free protocols

Journal of Parallel and Distributed Computing
SPENK: adding another level of parallelism on the cell broadband engine

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Idempotent work stealing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Fast and scalable list ranking on the GPU

Proceedings of the 23rd international conference on Supercomputing
Parallel Clustering Algorithm for Large Data Sets with Applications in Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Fast minimum spanning tree for large graphs on the GPU

Proceedings of the Conference on High Performance Graphics 2009
Accomplishing approximate FCFS fairness without queues

HiPC'07 Proceedings of the 14th international conference on High performance computing
Petascale computing for large-scale graph problems

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
A scalable parallel union-find algorithm for distributed memory computers

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
High-Performance algorithm engineering for large-scale graph problems and computational biology

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Performance, scalability, and semantics of concurrent FIFO queues

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Fast RMWs for TSO: semantics and implementation

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Distributed queues in shared memory: multicore performance and scalability through quantitative relaxation

Proceedings of the ACM International Conference on Computing Frontiers
Fence-free work stealing on bounded TSO processors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. Many PRAM algorithms can be adapted to SMPs with few modifications. Yet there are few studies that deal with the implementation and performance issues of running PRAM-style algorithms on SMPs. Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but these irregular problems often have no known efficient parallel implementations. Experimental studies have been conducted on related problems (minimum spanning tree and connected components) using parallel computers, but only achieved reasonable speedup on regular graph topologies that can be implicitly partitioned with good locality features or on very dense graphs with limited numbers of vertices. In this paper we present a new randomized algorithm and implementation with superior performance that for the first time achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (np^2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for shared-memory parallel computers. The main results of this paper are1.A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2.an experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freely-available from our web site. pc.ece.unm.edu.