Towards modeling the performance of a fast connected components algorithm on parallel machines

Authors:
Steven S. Lumetta;Arvind Krishnamurthy;David E. Culler
Affiliations:
Department of Electrical Engineering and Computer Sciences, Computer Science Division, University of California at Berkeley;Department of Electrical Engineering and Computer Sciences, Computer Science Division, University of California at Berkeley;Department of Electrical Engineering and Computer Sciences, Computer Science Division, University of California at Berkeley
Venue:
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Year:
1995

Citing 14
Cited 3

Robot vision

Robot vision
An O(n2 log n) parallel max-flow algorithm

Journal of Algorithms
An optimal randomized parallel algorithm for finding connected components in a graph

SIAM Journal on Computing
Connected components in O(lg3/2|V|) parallel time for the CREW PRAM (extended abstract)

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Fast connected components algorithms for the EREW PRAM

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Connected component labeling on coarse grain parallel computers: an experimental study

Journal of Parallel and Distributed Computing
A comparison of parallel algorithms for connected components

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Finding connected components in O(log n loglog n) time on the EREW PRAM

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Computing connected components on parallel computers

Communications of the ACM
Scaling Parallel Programs for Multiprocessors: Methodology and Examples

Computer
A Case for NOW (Networks of Workstations)

IEEE Micro
Experience with active messages on the Meiko CS-2

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Implementing an Efficient Portable Global Memory Layer on

Implementing an Efficient Portable Global Memory Layer on

Designing stimulating programming assignments for an algorithms course: a collection of exercises based on random graphs

ACM SIGCSE Bulletin
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
Portable and Efficient Parallel Computing Using the BSP Model

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present and analyze a portable, high-performance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the effect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.