A comparison of parallel algorithms for connected components
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)
On power-law relationships of the Internet topology
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Computing connected components on parallel computers
Communications of the ACM
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols
Journal of Parallel and Distributed Computing
Efficient Breadth-First Search on the Cell/BE Processor
IEEE Transactions on Parallel and Distributed Systems
Introducing mNUMA: an extended PGAS architecture
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Shared work list: hacking amorphous data parallelism in UPC
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Introducing ScaleGraph: an X10 library for billion scale graph analytics
Proceedings of the 2012 ACM SIGPLAN X10 Workshop
Hi-index | 0.00 |
Irregular graph algorithms for distributed-memory systems are hard to implement and optimize. Recent developments in PGAS languages make the implementation of irregular algorithms easier. In this paper we present our study of PRAM-based parallel connected components algorithm implemented in UPC for distributed-memory systems, and discuss optimization techniques for such settings. Our optimized version achieved more than 100 times speedup over the straight-forward implementation. Remarkable speedups are also achieved over the best SMP implementation for the same input. As the memory access patterns of these algorithms are representative of those of many other PRAM algorithms, we expect our techniques applicable to optimizing a wide range of PRAM graph algorithms on distributed-memory machines.