Fast PGAS connected components algorithms

Authors:
Guojing Cong;Gheorghe Almasi;Vijay Saraswat
Affiliations:
IBM Research, Yorktown Heights, NY;IBM Research, Yorktown Heights, NY;IBM Research, Hawthorne, NY
Venue:
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Year:
2009

Citing 8
Cited 3

A comparison of parallel algorithms for connected components

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)

SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)
On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Computing connected components on parallel computers

Communications of the ACM
Blue Gene: a vision for protein science using a petaflop supercomputer

IBM Systems Journal - Deep computing for the life sciences
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols

Journal of Parallel and Distributed Computing
Efficient Breadth-First Search on the Cell/BE Processor

IEEE Transactions on Parallel and Distributed Systems

Introducing mNUMA: an extended PGAS architecture

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Shared work list: hacking amorphous data parallelism in UPC

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Introducing ScaleGraph: an X10 library for billion scale graph analytics

Proceedings of the 2012 ACM SIGPLAN X10 Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Irregular graph algorithms for distributed-memory systems are hard to implement and optimize. Recent developments in PGAS languages make the implementation of irregular algorithms easier. In this paper we present our study of PRAM-based parallel connected components algorithm implemented in UPC for distributed-memory systems, and discuss optimization techniques for such settings. Our optimized version achieved more than 100 times speedup over the straight-forward implementation. Remarkable speedups are also achieved over the best SMP implementation for the same input. As the memory access patterns of these algorithms are representative of those of many other PRAM algorithms, we expect our techniques applicable to optimizing a wide range of PRAM graph algorithms on distributed-memory machines.