A bridging model for parallel computation
Communications of the ACM
Direct bulk-synchronous parallel algorithms
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Communication-efficient parallel sorting (preliminary version)
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Cilk: an efficient multithreaded runtime system
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)
Communication-optimal parallel minimum spanning tree algorithms (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
External-memory graph algorithms
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
On power-law relationships of the Internet topology
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Designing Practical Efficient Algorithms for Symmetric Multiprocessors
ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
On the Architectural Requirements for Efficient Execution of Graph Algorithms
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols
Journal of Parallel and Distributed Computing
Parallel breadth-first search on distributed memory systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing the Barnes-Hut algorithm in UPC
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Shared work list: hacking amorphous data parallelism in UPC
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Introducing ScaleGraph: an X10 library for billion scale graph analytics
Proceedings of the 2012 ACM SIGPLAN X10 Workshop
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Fast and memory-efficient minimum spanning tree on the GPU
International Journal of Computational Science and Engineering
Massive data analytics: the graph 500 on IBM Blue Gene/Q
IBM Journal of Research and Development
PAGE: a partition aware graph computation engine
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Due to the memory intensive workload and the erratic access pattern, irregular graph algorithms are notoriously hard to implement and optimize for high performance on distributed-memory systems. Although the PGAS paradigm proposed recently improves ease of programming, no high performance PGAS implementation of large-scale graph analysis is known. We present the first fast PGAS implementation of graph algorithms for the connected components and minimum spanning tree problems. By improving memory access locality, compared with the naive implementation, our implementation exhibits much better communication efficiency and cache performance on a cluster of SMPs. With additional algorithmic and PGASspecific optimizations, our implementation achieves significant speedups over both the best sequential implementation and the best single-node SMP implementation for large, sparse graphs with more than a billion edges.